DS-GA 3001.009: Special Topics in Data Science:
Responsible Data Science

New York University, Center for Data Science, Spring 2019

Lecture: Mondays from 11am-12:40pm; Lab: Thursdays from 5:20pm-6:10pm

Location: 60 5th Avenue, Room 110

Instructor: Julia Stoyanovich, Assistant Professor of Data Science, Computer Science and Engineering.
Office hours Mondays 1:30-3pm or by appointment, at 60 5th Avenue, Room 605.

Section Leader: Udita Gupta. Office hours Thursdays 4-5pm at 60 5th Avenue, Room 663.

Syllabus: pdf

Course Description:

The first wave of data science focused on accuracy and efficiency – on what we can do with data. The second wave focuses on responsibility – on what we should and shouldn’t do. Irresponsible use of data science can cause harm on an unprecedented scale. Algorithmic changes in search engines can sway elections and incite violence; irreproducible results can influence global economic policy; models based on biased data can legitimize and amplify racist policies in the criminal justice system; algorithmic hiring practices can silently and scalably violate equal opportunity laws, exposing companies to lawsuits and reinforcing the feedback loops that lead to lack of diversity. Therefore, as we develop and deploy data science methods, we are compelled to think about the effects these methods have on individuals, population groups, and on society at large.

Responsible Data Science is a technical course that tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection. The course is developed and taught by Julia Stoyanovich, Assistant Professor at the Center for Data Science and at the Tandon School of Engineering, and member of the NYC Automated Decision Systems Task Force.

Prerequisites: Introduction to Data Science, Introduction to Computer Science, or similar courses.

Background Reading (required)

Barocas and Selbst (2016) “Big Data’s Disparate Impact” pdf
White House Report on Big Data (2014) “Big Data: Seizing Opportunities, Preserving Values” pdf
Brauneis and Goodman (2017) “Algorithmic Transparency for the Smart City” pdf
Kroll et al. (2017) “Accountable Algorithms” pdf

Background Reading (optional)

Matthew Salganik “Bit by Bit: Social Research in the Digital Age” (read online)
Cathy O’Neil “Weapons of Math Destruction”
Frank Pasquale “The Black Box Society”
Virginia Eubanks “Automating Inequality”

Schedule

This weekly schedule is tentative and is subject to change.

Date	Topic	Materials	Assignments
Jan 28	Lecture: Introduction and background Topics: Course outline, aspects of responsibility in data science through recent examples. Reading: “Bias in Computer Systems”, Friedman and Nissenbaum (1996) ACM DL “Machine Bias”, Angwin, Larson, Mattu, Kirchner (2016) ProPublica “Data, Responsibly”, Abiteboul and Stoyanovich (2015) ACM SIGMOD blog	slides
Jan 31	Lab: ProPublica’s Machine Bias	jupyter notebook
Feb 4	Lecture: Fairness Topics: A taxonomy of fairness definitions; individual and group fairness. The importance of a socio-technical perspective: stakeholders and trade-offs. Reading: “Big Data’s Disparate Impact”, Barocas and Selbst (2016) pdf “Fairness through awareness”, Dwork, Hardt, Pitassi, Reingold, Zemel (2012) ACM DL “On the (im)possibility of fairness”, Friedler, Scheidegger, Venkatasubramanian (2016) arXiv	slides
Feb 7	Lab: IBM’s AI Fairness 360 toolkit Reading: “Data preprocessing techniques for classification without discrimination”, Kamiran and Calders (2012) pdf	jupyter notebook slides
Feb 11	Lecture: Fairness Topics: Impossibility results; causal definitions; fairness beyond classification. Reading: “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments”, Chouldechova (2017) arXiv “Inherent Trade-Offs in the Fair Determination of Risk Scores”, Kleinberg, Mullainathan, Raghavan (2017) pdf “Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions”, Mitchell, Porash, Barocas (2018) arXiv	slides
Feb 14	Lab: IBM’s AI Fairness 360 toolkit Reading: “Certifying and removing disparate impact”, M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) pdf	jupyter notebook slides	HW1 assigned
Feb 18	No class, university holiday
Feb 21	Lab: Fairness and causality	slides
Feb 25	Lecture: Anonymity and privacy, guest lecture by Daniela Hochfellner Topics: Overview of responsible data sharing. Anonymization techniques; the limits of anonymization. Harms beyond re-identification. Reading: “The Belmont Report” (1979) pdf “Critical questions for Big Data”, danah boyd and Cate Crawford (2012) pdf	slides	HW1 due
Feb 28	Lab: Anonymity and privacy	jupyter notebook jupyter notebook brute force slides
Mar 4	Lecture: no class, snow day
Mar 7	Lab: Anonymity and privacy (see Mar 11 materials)
Mar 11	Lecture: Anonymity and privacy Topics: Differential privacy; privacy-preserving synthetic data generation; exploring the privacy / utility trade-off. Reading: “A firm foundation for private data analysis”, C. Dwork (2011) ACM DL “Can a set of equations keep U.S. census data private?”, J. Mervis (2019) Science	slides
Mar 14	Lab: Data Synthesizer Reading: “DataSynthesizer: Privacy-Preserving Synthetic Datasets”, Ping, Stoyanovich, Howe (2017) ACM DL	jupyter notebook slides	HW2 assigned
Mar 18	No class, university holiday
Mar 21	No class, university holiday
Mar 25	Lecture: Profiling and particularity, guest lecture by Solon Barocas Topics: Profiling and particularity Reading: “On individual risk”, Dawid (2017) pdf “We Are All Different: Statistical Discrimination and the Right to Be Treated as an Individual”, Lippert-Rasmussen (2011) pdf	slides
Mar 28	Lab: Data profiling	jupyter notebook slides	HW2 due
Apr 1	Lecture: Data profiling Topics: Overview of the data science lifecycle. Data profiling and validation. Reading: “Profiling relational data: a survey”, Abedjan, Golab, Naumann (2015) pdf “To predicts and serve?”, Lum and Isaac (2016) pdf	slides	HW3 assigned
Apr 4	Lab: Data profiling
Apr 8	Lecture: Transparency Topics: Auditing black-box models; explainable machine learning. Reading: “Why should I trust you? Explaining the predictions of any classifier”, Ribeiro, Singh, Guestrin (2016) pdf “Algorithmic transparency via quantiative input influence: theory and experiments with learning systems”, Datta, Sen, Zick (2016) pdf	slides
Apr 11	Lab: LIME	jupyter notebook	HW3 due HW4 assigned
Apr 15	Lecture: Transparency Topics: Discrimination in online ad delivery. Interpretability. Reading: “Automated Experiments on Ad Privacy Settings”, Datta, Tschantz, Datta (2015) pdf “Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes”, Ali, Sapiezynski, Bogen, Korolova, Mislove, Rieke (2019) pdf “Facebook has been charged with housing discrimination by the US government”, Russell Brandom for The Verge, Mar 28, 2019 read online	slides
Apr 18	Lab: Final review
Apr 22	Lecture: Final exam (in class)
Apr 25	Lab: Nutritional labels	jupyter notebook slides	HW4 due Project assigned
Apr 29	Lecture: Data Cleaning guest lecture by Sebastian Schelter Topics: Overview of data cleaning. Reading: “Quantitative Data Cleaning for Large Databases”, Joe Hellerstein (2008) pdf	slides
May 2	Lab: Data cleaning	jupyter notebook
May 6	Lecture: Legal frameworks, codes of ethics, and personal responsibility. Reading: “The Belmont Report” (1979) pdf “The Menlo Report” (2012) pdf “Chapter 6: Ethics. Bit by Bit: Social Research in the Digital Age”, Matthew Salganik (2017) online	slides
May 9	Lab: Talk by Rashida Richardson Topics: Civil rights, predictive policing, and criminal justice Reading: “Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice”, Richardson, Schultz, Crawford (2019) online CDS 7th Floor open area, 4-5:30pm
May 13	Lecture: Project presentations		Project report due

For additional information about our work, see Data, Responsibly

DS-GA 3001.009: Special Topics in Data Science: Responsible Data Science

New York University, Center for Data Science, Spring 2019

Background Reading (required)

Background Reading (optional)

Schedule

DS-GA 3001.009: Special Topics in Data Science:
Responsible Data Science