Data Engineering & Data Science in 30 hours

I have been commissioned to prepare a thorough Big Data class to fit in 30 hours of teaching. The goal is to introduce practical Data Engineering and Data Science to technical personnel (corporate or academic). The class is very technical and hands-on. Most subjects are introduced by examples that students can play with.

Prerequisites: the participants need to be technical, reasonably fluent in general programming, operating systems, and exposure to Linux shell, databases, and SQL. Some of the content of lectures 1 through 8 will serve to refresh those basic concepts. Good working knowledge of basic Python is also assumed and will be needed in the final lectures. For instance, my students in parallel to my lectures currently undergo a 16-hours Python class to earn fluency of working with lists, dictionaries, enumeration, list comprehension and the basic language concepts. Their Python training will be done in two months, right in time when we hit Python in lectures 17 through 29.

Because the content is quite broad and somewhat experimental, I welcome comments, which can be emailed to or left as a commentary below. My main dilemma is the subject selection, noting that we have 30 hours and the material is vast. Quite obviously, I can merely provide starting points. I decided to limit the amount of material in such a way that in every class we have time for hands-on exercises. Therefore we ended up with limiting the subjects to about 11 leading topics, keeping in mind that we will skip many important technologies. Is the selection correct? I will be grateful for any comments from experienced teachers and practicioners.

The Big Data in 30 hours syllabus and material.

Please follow and like us: