Lecture: Mondays from 11am-12:40pm; Lab:</strong> Thursdays from 5:20pm-6:10pm
Instructor: Julia Stoyanovich, Assistant Professor of Data Science, Computer Science and Engineering
The first wave of data science focused on accuracy and efficiency – on what we can do with data. The second wave focuses on responsibility – on what we should and shouldn’t do. Irresponsible use of data science can cause harm on an unprecedented scale. Algorithmic changes in search engines can sway elections and incite violence; irreproducible results can influence global economic policy; models based on biased data can legitimize and amplify racist policies in the criminal justice system; algorithmic hiring practices can silently and scalably violate equal opportunity laws, exposing companies to lawsuits and reinforcing the feedback loops that lead to lack of diversity. Therefore, as we develop and deploy data science methods, we are compelled to think about the effects these methods have on individuals, population groups, and on society at large.
Responsible Data Science is a technical course that tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection. The course is developed and taught by Julia Stoyanovich, Assistant Professor at the Center for Data Science and at the Tandon School of Engineering, and member of the NYC Automated Decision Systems Task Force.
Prerequisites: Introduction to Data Science, Introduction to Computer Science, or relevant courses
This weekly schedule is tentative and is subject to change.
|Jan 28||Introduction and background
Topics: Aspects of responsibility in data science through recent examples.
|Feb 4||The data science lifecycle, data profiling
Topics: Overview of the data science lifecycle. Data profiling and validation. Is my dataset biased? Documenting data transformations. Normalization and standardization.
|Feb 11||Data sharing, anonymity and privacy
Topics: Overview of responsible data sharing. Anonymization techniques; the limits of anonymization. Harms beyond re-identification.
|Feb 25||Anonymity and privacy continued
Topics: Differential privacy; privacy-preserving synthetic data generation; exploring the privacy / utility trade-off.
|Mar 4||Data cleaning
Topics: Qualitative and quantitative error detection. Missing attribute values and imputation. Outlier detection; duplicate detection. Documenting data cleaning transformations.
|Mar 11||Midterm exam|
|Mar 25||Algorithmic fairness
Topics: A taxonomy of fairness definitions; individual and group fairness. The importance of a socio-technical perspective: stakeholders and trade-offs.
|Apr 1||Algorithmic fairness continued
Topics: Impossibility results; causal definitions; fairness beyond classification.
Topics: Background on diversity in information retrieval, recommender systems and crowdsourcing; diversity models and algorithms; diversity vs. fairness; trade-offs between diversity and utility.
Topics: Auditing black-box models; explainable machine learning; software testing.
|Apr 22||Transparency continued
Topics: Online price discrimination, transparency in online ad delivery.
Transparency and accountability. Legal frameworks: GDPR and the right to explanation; NYC ADS transparency law. From auditing to interpretability.