Lecture: Mondays from 11am-12:40pm; Lab: Thursdays from 5:20pm-6:10pm
Location: 60 5th Avenue, Room 110
Instructor: Julia Stoyanovich, Assistant Professor of Data Science, Computer Science and Engineering.
Office hours Mondays 1:30-3pm or by appointment, at 60 5th Avenue, Room 605.
Section Leader: Udita Gupta. Office hours Thursdays 4-5pm at 60 5th Avenue, Room 663.
The first wave of data science focused on accuracy and efficiency – on what we can do with data. The second wave focuses on responsibility – on what we should and shouldn’t do. Irresponsible use of data science can cause harm on an unprecedented scale. Algorithmic changes in search engines can sway elections and incite violence; irreproducible results can influence global economic policy; models based on biased data can legitimize and amplify racist policies in the criminal justice system; algorithmic hiring practices can silently and scalably violate equal opportunity laws, exposing companies to lawsuits and reinforcing the feedback loops that lead to lack of diversity. Therefore, as we develop and deploy data science methods, we are compelled to think about the effects these methods have on individuals, population groups, and on society at large.
Responsible Data Science is a technical course that tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection. The course is developed and taught by Julia Stoyanovich, Assistant Professor at the Center for Data Science and at the Tandon School of Engineering, and member of the NYC Automated Decision Systems Task Force.
Prerequisites: Introduction to Data Science, Introduction to Computer Science, or similar courses.
This weekly schedule is tentative and is subject to change.
|Jan 28||Lecture: Introduction and background
Topics: Course outline, aspects of responsibility in data science through recent examples.
“Bias in Computer Systems”, Friedman and Nissenbaum (1996) ACM DL
“Machine Bias”, Angwin, Larson, Mattu, Kirchner (2016) ProPublica
“Data, Responsibly”, Abiteboul and Stoyanovich (2015) ACM SIGMOD blog
|Jan 31||Lab: ProPublica’s Machine Bias||jupyter notebook|
|Feb 4||Lecture: Fairness
Topics: A taxonomy of fairness definitions; individual and group fairness. The importance of a socio-technical perspective: stakeholders and trade-offs.
“Big Data’s Disparate Impact”, Barocas and Selbst (2016) pdf
“Fairness through awareness”, Dwork, Hardt, Pitassi, Reingold, Zemel (2012) ACM DL
“On the (im)possibility of fairness”, Friedler, Scheidegger, Venkatasubramanian (2016) arXiv
|Feb 7||Lab: IBM’s AI Fairness 360 toolkit
“Data preprocessing techniques for classification without discrimination”, Kamiran and Calders (2012) pdf
|jupyter notebook slides|
|Feb 11||Lecture: Fairness
Topics: Impossibility results; causal definitions; fairness beyond classification.
“Fair prediction with disparate impact: A study of bias in recidivism prediction instruments”, Chouldechova (2017) arXiv
“Inherent Trade-Offs in the Fair Determination of Risk Scores”, Kleinberg, Mullainathan, Raghavan (2017) pdf
“Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions”, Mitchell, Porash, Barocas (2018) arXiv
|Feb 14||Lab: IBM’s AI Fairness 360 toolkit
“Certifying and removing disparate impact”, M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) pdf
|jupyter notebook slides||HW1 assigned|
|Feb 18||No class, university holiday|
|Feb 21||Lab: Fairness and causality||slides|
|Feb 25||Lecture: Anonymity and privacy, guest lecture by Daniela Hochfellner
Topics: Overview of responsible data sharing. Anonymization techniques; the limits of anonymization. Harms beyond re-identification.
“The Belmont Report” (1979) pdf
“Critical questions for Big Data”, danah boyd and Cate Crawford (2012) pdf
|Feb 28||Lab: Anonymity and privacy||jupyter notebook jupyter notebook brute force slides|
|Mar 4||Lecture: no class, snow day|
|Mar 7||Lab: Anonymity and privacy (see Mar 11 materials)|
|Mar 11||Lecture: Anonymity and privacy
Topics: Differential privacy; privacy-preserving synthetic data generation; exploring the privacy / utility trade-off.
“A firm foundation for private data analysis”, C. Dwork (2011) ACM DL
“Can a set of equations keep U.S. census data private?”, J. Mervis (2019) Science
|Mar 14||Lab: Data Synthesizer
“DataSynthesizer: Privacy-Preserving Synthetic Datasets”, Ping, Stoyanovich, Howe (2017) ACM DL
|jupyter notebook slides||HW2 assigned|
|Mar 18||No class, university holiday|
|Mar 21||No class, university holiday|
|Mar 25||Lecture: Profiling and particularity, guest lecture by Solon Barocas
Topics: Profiling and particularity
“On individual risk”, Dawid (2017) pdf
“We Are All Different: Statistical Discrimination and the Right to Be Treated as an Individual”, Lippert-Rasmussen (2011) pdf
|Mar 28||Lab: Data profiling||jupyter notebook slides||HW2 due|
|Apr 1||Lecture: Data profiling
Topics: Overview of the data science lifecycle. Data profiling and validation.
“Profiling relational data: a survey”, Abedjan, Golab, Naumann (2015) pdf
“To predicts and serve?”, Lum and Isaac (2016) pdf
|Apr 4||Lab: Data profiling|
|Apr 8||Lecture: Transparency
Topics: Auditing black-box models; explainable machine learning.
“Why should I trust you? Explaining the predictions of any classifier”, Ribeiro, Singh, Guestrin (2016) pdf
“Algorithmic transparency via quantiative input influence: theory and experiments with learning systems”, Datta, Sen, Zick (2016) pdf
|Apr 11||Lab: LIME||jupyter notebook||HW3 due
|Apr 15||Lecture: Transparency
Topics: Discrimination in online ad delivery. Interpretability.
“Automated Experiments on Ad Privacy Settings”, Datta, Tschantz, Datta (2015) pdf
“Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes”, Ali, Sapiezynski, Bogen, Korolova, Mislove, Rieke (2019) pdf
“Facebook has been charged with housing discrimination by the US government”, Russell Brandom for The Verge, Mar 28, 2019 read online
|Apr 18||Lab: Final review|
|Apr 22||Lecture: Final exam (in class)|
|Apr 25||Lab: Nutritional labels||HW4 due Project assigned|
|Apr 29||Lecture: Data Cleaning guest lecture by Sebastian Schelter
Topics: Overview of data cleaning.
Reading: “Quantitative Data Cleaning for Large Databases”, Joe Hellerstein (2008) pdf
|May 2||Lab: Data cleaning|
|May 6||Lecture: Legal frameworks, codes of ethics, and personal responsibility.|
|May 9||Lab: TBD|
|May 13||Lecture: Project presentations||Project report due|