BBM467 - Data Intensive Applications (Spring 2020)


Lecturer: Dr. Fuat Akal

Lectures: Wednesdays 09:00-12:00 @D8

Practicum (BBM469): Fridays 14:00-16:00 @D8

Assistants: Alaettin Uçan, Ahmet Alkılınç

Office Hours: Open door policy


Announcements:

Announcements will be made through Piazza Page only. See very bottom of this page.


Course Description:

The objective of this course is to teach students fundamentals of big data management and analytics. They will gain experience with some key technologies, platforms, tools and systems used by big data scientists and engineers. Key words: Big Data, Data Science, Distributed Computing, Cluster Computing, Scalable Machine Learning, Cloud Computing and Virtualization, Data and Ethics.


Prerequisites:

Prior to attending this course, participants MUST have basic understanding of computer systems, databases, distributed computing and machine learning. Python programming knowledge would be useful.


Course Work:

Lectures will be conducted in English in the classroom. Course materials will be in English. Attendance is mandatory and will be rewarded while grading. Beware that this course requires hard work.


Lab Work (BBM469):

Although BBM469 is an independent lecture, it goes hand in hand with BBM467. It is strongly recommended that you enroll in BBM469 only if you are enrolled in BBM467. There will be three assignments (0% + 25% + 25%) and a Data Science Capstone Project, DSCP (50%). Students may work alone or in groups of maximum two. Assignments and projects must be delivered within deadlines. Late deliveries will be penalized by 10 points per day for at most three days.


Grading:

The lecturer strongly believes that any student can pass the class with a good grade as long as she/he tries enough. There will be one written midterm examination (40%) and one final examination (60%).


Attendance

Attendance is NOT mandatory due to pandemic.


Schedule:

The schedule is tentative for the moment.


Week # Date Title Slides Reading BBM469 Lab Work
Deadlines are on Fridays unless stated explicitly
1 26.02 What is Data Science? Intoduction to Big Data? pdf    
2 04.03 Data Science Methodology pdf    
3 11.03 Python for Data Science pdf   Lab Session: Open Source tools for Data Science
Assignment 1 (out): Python Exercises (Take home, no delivery)
4 18.03 No Lecture due to Corona Break      
5 25.03 No Lecture due to Corona Break      
6 01.04 Data Analysis with Python pdf   Quiz (Coverage: Python for Data Science and Assignment 1)

Deadline for Building DSCP Groups
7 08.04 Data Visualization with Python pdf    
8 15.04 Machine Learning with Python pdf   Submission of DSCP Proposals
Deadline: 22.04, Midnight

Assignment 2 (out): Clustering and Classification with Python
Deadline: 01.05, Midnight
9 22.04 Foundations for Big Data Systems pdf    
10 29.04 Scalable Machine Learning with Spark     Assignment 3 (out): Machine Learning with Spark
Deadline: 15.05, Midnight
11 06.05 Stream Processing pdf    
12 13.05 Semi-structured Data pdf    
13 20.05 NOSQL pdf   DSCP Final Deliveries
Deadline: 27.05, Midnight
14 27.05 Data and Ethics pdf    


Course Material:

I do not follow a specific text book but, here are few books I can refer to.


Communication:

The course webpage will be kept up-to-date throughout the semester. All course related communications will be carried out through Piazza.

Anonymous Feedback Forms:

Please use Fuat's Anonymous Feedback Form if you have something to tell me in private while staying anonymous. Do not forget that this form are not to inform on your friends!