View on GitHub

Pds-fall-2013

Welcome to Practical Data Science.

Data is the new oil. Data is a new class of economic asset. Those were the conclusions of the reports issued by the World Economic Forum at Davos in January 2011 and January 2012. Research published in 2011 by MIT economists shows that companies adopting “data-driven decision-making” achieved significant productivity gains over other firms. In industry, the hottest job these days is the Data Scientist. Data scientists combine technical and statistical skills, analytical thinking, and business acumen. One of the complaints about the data scientists trained in computer science departments is that they’re “just technical”, understanding algorithms well, but lacking important skills in problem formulation, evaluation, and analysis generally. On the other hand, those trained in business schools tend to have underdeveloped technical skills. This course will cover all of these aspects of being a data scientist.

This class is an introduction to the practice of data science. The student will leave the class with a broad set of practical data analytic skills based on building real analytic applications on real data. These skills include accessing and transferring data, applying various analytical frameworks, applying methods from machine learning and data mining, conducting large-scale rigorous evaluations with business goals in mind, and the understanding, visualization, and presentation of results. The student will have experience processing “big data,” the latest buzz concept in a field awash with buzz. Specifically, the student will be able to analyze data that are too big to fit in the computer’s memory, and therefore thwart many standard analytical tools. The student will have experience with unstructured data, for example processing text for applications such as “sentiment analysis” of user-generated content on the web.

Syllabus

Course Objectives

Course Blog

Course Project

Course Homework

Setting up your Data Science Environment

Python & Programming

Unix Command Line

Data and its Representations

Machine Learning & Data Mining

Regular Expressions and Text Processing