Table of Contents
Tutorial Notes on Web-Scale Information Analytics

Preamble

Preamble

Foreword

This site archives the tutorial notes I constructed for the course CUHK-ENGG4030 spring 2014. It covers a lot of diversified topics ranging from systems to algorithms about data mining. The notes are not meant to be a comprehensive collection about data science or about big data. I only have 45 minutes per week for 12 weeks. So the notes are more likely to be the highlights of the whole game: from systems to algorithms; from developing based-on first-principle to using packages; from using a package to modifying a package; from high level Python interface to low-level C++ interface; from data collection, pre-processing, cleansing to data mining and visualization; from synthesized data to real data; from open-end exploration to data mining challenge; from tiny data to big data ...

Merely reading the tutorial notes is not very useful. You need to exercise in order to get the most out of it. Those notes are carefully designed so as to be reproducible. Although our class uses Microsoft Azure and an internal cluster of our department, the exercises can be done in any other general Linux environment. There should be little problem to practice on your own laptop, on Amazon Web Service, or on Google Compute Engine. Besides, you are recommended to go to the course website for lecture notes and homeworks.

Since the website will be reconstructed in the next offering, I make this archive so that it is permanently and publicly available. All the materials are released under CC4 license. If there is any problem, feel free to post comments or file issues on GitHub.

Old Preamble from the Course

This is Tutorial 0. Materials for basic skill training will be listed here.

Those are not hard prerequisites of this course. You will learn (part of) them gradually during a series of hands-on experiments. Getting familiar with them in advance will help.

Daily working environments:

Computing platforms:

Since this is the first offer of this course, materials are under active construction. You are highly welcome to join us in any ways: material suggestion, experiment idea, error correction, ...

comments powered by Disqus
▶ Back ▲ Top