BBM467 - Data Intensive Applications (Fall 2022)


Lecturer: Dr. Fuat Akal

Lectures: Mondays, @09:15, Seminar Hall

Office Hours: By Appt. Only

Practicum (BBM469): Mondays, @16:00-18:00, D9

TAs: Tuğba Gürgen Erdoğan




Course Description


The objective of this course is to teach students fundamentals of big data management and analytics. They will gain experience with some key technologies, platforms, tools and systems used by big data scientists and engineers. Key words: Big Data, Data Science, Distributed Computing, Cluster Computing, Scalable Machine Learning, Cloud Computing and Virtualization, Data and Ethics.


Prerequisites:


Prior to attending this course, participants better have basic understanding of computer systems, databases, and machine learning as well as Python programming and SQL knowledge. If you do not have these capabilities and still want to enroll, you must be willing to invest a lot of time to make up.


Course Work and Grading


Beware that this course requires HARD WORK. Please consider a past student's testimony before you enrolll: "... aside from this course I have no other course which I consider 'hard'. ... Nobody should be forced to give all of their free time to a single course, and nobody should be forced to give all of their free thoughts to a single course."


Lectures will be conducted in English in the classroom. Past year's video recordings can be found on youtube. Course materials will be in English. There will be reading materials as well. Students are responsible from course materials, topics covered in practicum, assignments, quizzes, and readings. Full attendance to classes is recommended. No points will be awarded for attendance. It may be considered for bonus points when necessary while giving the final grades though.


The lecturer strongly believes that any student can pass the class with a good grade as long as she/he tries hard enough. There will be one written midterm examination (15%), a Blog Post Project (10%), a Data Science Capstone Project (35%), and one final examination (40%). Also note that a student must achieve at least 40% in the final exam to pass the class.


You will be working on two different projects. You may work alone or in groups of maximum two.


Hard work always leads to great reward. We published two conference papers as the outcome of students projects in Fall 2021 semester.


I welcome students who are interested in publishing their work. Writing a paper, submitting it, waiting for a decision, and receiving the acceptance letter is an amazing experience. You could make your resume stronger as well.

Here are more incentives. At the end of the semester, if you deliver a publishable work



Lab Work (BBM469):


Although BBM469 is an independent lecture, it goes hand in hand with BBM467. It is strongly recommended that you enroll in BBM469 only if you are enrolled in BBM467.

There will be five assignments. You will be working alone. Assignments #1, #2, and #4 will be on DataCamp. You will be completing some online courses and doing quizzes. Assignments #3 and #5 will be on Google Cloud Skills Boost as online courses and hand-on labs. Google credits to conduct labs will be provided by the lecturer.

Assignments and projects must be delivered within deadlines. Late deliveries are not allowed.


Schedule


Week # Date Title Slides Reading BBM467 Practicum BBM469 Lab Work
Deadlines are on lab days unless stated explicitly
1 03.10 What is Data Science? Intoduction to Big Data? pdf      
2 10.10 Data Science Methodology pdf      
3 17.10 Data Analysis with Python pdf     Assignment 1 (out): Introduction to Data Science in Python,
Python Data Science Toolbox (Part 1 and 2)
4 24.10 Data Visualization with Python pdf   Establishing DSCP and BPP Groups
Deadline: 23/10/2022, 23:59
5 31.10 Machine Learning with Python pdf     Assignment 2 (out): Supervised Learning with scikit-learn,
Unsupervised Learning in Python

Assignment 1 (due)
6 07.11 Foundations for Big Data Systems, Clusters, Hadoop pdf      
7 14.11 Cloud Computing, Virtualization, Containerization pdf   Submission of DSCP and BPP Proposals
Deadline: 13/11/2022, 23:59
Assignment 3 (out): Google Cloud Fundamentals: Core Infrastructure

Assignment 2 (due)
8 21.11 Midterm pdf      
9 28.11 SQL for Data Science pdf     Assignment 4 (out): Exploratory Data Analysis in SQL,
Data-Driven Decision Making in SQL
10 05.12 NOSQL pdf     Assignment 3 (due)
11 12.12 Scalable Machine Learning with Spark pdf     Assignment 5 (out): Google Cloud Big Data and Machine Learning Fundamentals
12 19.12 Blockchain pdf     Assignment 4 (due)
13 26.12 Data and Ethics     DSCP Final Deliveries
Deadline: 30/12/2022, 23:59
 
14 02.01 Review for Final Examination pdf   BPP Final Deliveries
Deadline: 06/01/2023, 23:59
Assignment 5 (due)

Course Material:


I do not follow a specific text book but, here are few books I can refer to.

Communication:


The course webpage will be kept up-to-date throughout the semester. All course related communications will be carried out through Piazza.


Anonymous Feedback Forms:


Please use Fuat's Anonymous Feedback Form if you have something to tell me in private while staying anonymous. Do not forget that this form are not to inform on your friends!