Lecture: Mondays from 11am-12:40pm; Lab: Mondays from 3:30pm-4:20pm
Location: 60 5th Avenue, Room 110
Instructor: Julia Stoyanovich, Assistant Professor of Data Science, Computer Science and Engineering.
Office hours Mondays 1:30-3pm or by appointment, at 60 5th Avenue, Room 703.
Section Leader: Brina Seidel. Office hours Thursdays 3:30-4:30pm or by appointment, at 60 5th Avenue, Room 660.
Grader: Prasanthi Gurumurthy. Office hours Wednesdays, 10:30-11:30am or by appointment, at 60 5th Avenue, Room 665.
The first wave of data science focused on accuracy and efficiency – on what we can do with data. The second wave focuses on responsibility – on what we should and shouldn’t do. Irresponsible use of data science can cause harm on an unprecedented scale. Algorithmic changes in search engines can sway elections and incite violence; irreproducible results can influence global economic policy; models based on biased data can legitimize and amplify racist policies in the criminal justice system; algorithmic hiring practices can silently and scalably violate equal opportunity laws, exposing companies to lawsuits and reinforcing the feedback loops that lead to lack of diversity. Therefore, as we develop and deploy data science methods, we are compelled to think about the effects these methods have on individuals, population groups, and on society at large.
Responsible Data Science is a technical course that tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection. The course is developed and taught by Julia Stoyanovich, Assistant Professor at the Center for Data Science and at the Tandon School of Engineering, and member of the NYC Automated Decision Systems Task Force.
Prerequisites: Introduction to Data Science, Introduction to Computer Science, or similar courses.
Lab Materials: Labs will be conducted using Jupyter Hub. Students should use their NYU NetID to log in, and click the “Assignments” tab to find the material for each week. After lab, links to the notebook for each class will be included on this page.
This weekly schedule is tentative and is subject to change.
|Jan 27||Lecture: Introduction and background. Algorithmic fairness.
Topics: Course outline, aspects of responsibility in data science through recent examples. Fairness in classification. The importance of a socio-technical perspective: stakeholders and trade-offs.
“Bias in Computer Systems”, Friedman and Nissenbaum (1996) ACM DL
“Machine Bias”, Angwin, Larson, Mattu, Kirchner (2016) ProPublica
“Data, Responsibly”, Abiteboul and Stoyanovich (2015) ACM SIGMOD blog
“Fairness through awareness”, Dwork, Hardt, Pitassi, Reingold, Zemel (2012) ACM DL
“On the (im)possibility of fairness”, Friedler, Scheidegger, Venkatasubramanian (2016) arXiv
|Jan 27||Lab: Intro to Jupyter Hub, ProPublica’s Machine Bias||notebook|
|Feb 3||Lecture: Algorithmic fairness continued.
Topics: Fairness in risk assessment. Fairness in ranking.
“Fair prediction with disparate impact: A study of bias in recidivism prediction instruments”, Chouldechova (2017) arXiv
“Inherent Trade-Offs in the Fair Determination of Risk Scores”, J. Kleinberg, S. Mullainathan, M. Raghavan (2017) pdf
“Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions”, Mitchell, Porash, Barocas (2018) arXiv
“Dissecting racial bias in an algorithm used to manage the health of populations”, Obermeyer, Powers, Vogel, Mullainathan(2019) Science
|Feb 3||Lab: IBM’s AI Fairness 360 toolkit
“AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias”, R. Bellamy et al. (2018) pdf
“Data preprocessing techniques for classification without discrimination”, F. Kamiran and T. Calders (2012) pdf
“Certifying and removing disparate impact”, M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) pdf
|Feb 10||Lecture: Data cleaning
Topics: Overview of data cleaning
“Profiling relational data: a survey”, Abedjan, Golab, Naumann (2015) pdf
“Quantitative data cleaning for large databases”, Hellerstein (2008) pdf
|Feb 10||Lab: IBM’s AI Fairness 360 toolkit
“FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions”, S. Schelter, Y. He, J. Khilnani, and J. Stoyanovich (2019) pdf
|Feb 17||No class, university holiday|
|Feb 24||Lecture (part 1): Fairness and causality
Topics: Counterfactual fairness
“The long road to fairer algorithms”, M. Kushner, J. Loftus (2020) Nature
“Counterfactual fairness”, M. Kusner, J. Loftus, C. Russell, R. Silva(2017) pdf
Lecture (part 2): Data profiling
Topics: Types of data profiling tasks, overview of the relational model
|slides(1) slides(2)||HW1 due
|Feb 24||Lab: Data profiling and data cleaning
course project discussion
|Mar 2||Lecture (part 1): Data profiling continued
Topics: Discovering uniques, frequent itemset and association rule mining
Lecture (part 2): Anonymity and privacy
Topics: Overview of responsible data sharing. Anonymization techniques; the limits of anonymization. Harms beyond re-identification.
“The Belmont Report” (1979) pdf
“Critical questions for Big Data”, danah boyd and Kate Crawford (2012) pdf
|Mar 2||Lab: Data profiling and data cleaning||notebook|
|Mar 9||Lecture: Anonymity and privacy
Topics: Differential privacy; privacy-preserving synthetic data generation; exploring the privacy / utility trade-off.
“A firm foundation for private data analysis”, C. Dwork (2011) ACM DL
“Can a set of equations keep U.S. census data private?”, J. Mervis (2019) Science
|slides||project proposal due|
|Mar 9||Lab: Data Synthesizer
“DataSynthesizer: Privacy-Preserving Synthetic Datasets”, Ping, Stoyanovich, Howe (2017) ACM DL
|Mar 16||No class, university holiday|
|Mar 23||Lecture: Ethical frameworks
Reading: “The Belmont Report” (1979) pdf
“The Menlo Report” (2012) pdf
“Chapter 6: Ethics. Bit by Bit: Social Research in the Digital Age”, Matthew Salganik (2017) online
|Mar 23||Lab: Ethical frameworks|
|Mar 30||Lecture: Transparency
Topics: Auditing black-box models; explainable machine learning.
“Why should I trust you? Explaining the predictions of any classifier”, Ribeiro, Singh, Guestrin (2016) pdf
“Algorithmic transparency via quantiative input influence: theory and experiments with learning systems”, Datta, Sen, Zick (2016) pdf
“A unified approach to interpreting model predictions”, Lundberg and Lee (2017) pdf
|Mar 30||Lab: LIME||notebook|
|Apr 6||Lecture: Transparency
Topics: Discrimination in online ad delivery.
“Automated Experiments on Ad Privacy Settings”, Datta, Tschantz, Datta (2015) pdf
“Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes”, Ali, Sapiezynski, Bogen, Korolova, Mislove, Rieke (2019) pdf
“Facebook has been charged with housing discrimination by the US government”, Russell Brandom for The Verge, Mar 28, 2019 read online
project report draft due
|Apr 6||Lab: Transparency|
|Apr 13||Lecture: Interpretability
|Apr 13||Lab: Interpretability|
|Apr 20||Lecture: RDS in practice: Guest lecture by Robert Cheetham, President and CEO of Azavea
Topics: Selecting your
|Apr 20||Lab: Final exam review|
|Apr 27||Lecture: Legal and regulatory frameworks
Topics: Data protection, algorithmic impact assessment, regulating Automated Decision Systems (ADS) and AI
|Final exam assigned (take-home)|
|Apr 27||Lab: Course project|
|May 4||Lecture: TBD
|May 4||Lab: TBD|
|May 11||Lecture: Project presentations||project report due|
|May 11||Lab: Project presentations|