Massive amounts of data are collected by many companies and other organizations, creating new opportunities for data scientists, but also raising several interesting challenges in extracting meaningful and actionable knowledge from data. Creating efficient and impactful data science processes is not an easy task: forming analysis questions is hard, data is messy, the volume and dimensionality of data are massive, and closing the loop in business and research operations is tough. The course aims to provide a comprehensive set of tools for extracting knowledge from data: forming analysis questions and measures; data manipulation, extraction, and labeling; efficient data analysis; and reporting and visualizing conclusions. This course will focus on the unique challenges that arise from the practical aspects of the field, relying on business and research case studies to highlight the full process of data science.
CS 5785 or equivalent and experience programming with Python, or permission of the instructor.
Mondays and Wednesdays, 4:45PM-6:00PM, 131 Bloomberg Center, Cornell Tech
Class number: 12791
Links: CMS for homework submission, wild-data-science.slack.com for discussions.