Lecture: Mondays from 11am-12:40pm; Lab: Mondays from 3:30pm-4:20pm
Location: 60 5th Avenue, Room 110
Instructor: Julia Stoyanovich, Assistant Professor of Data Science, Computer Science and Engineering.
Office hours Mondays 2-3pm or by appointment, online.
Section Leader: Brina Seidel. Office hours Thursdays 3:30-4:30pm or by appointment, online
Grader: Prasanthi Gurumurthy. Office hours Wednesdays, 10:30-11:30am or by appointment, online.
Syllabus: pdf
Course Description:
The first wave of data science focused on accuracy and efficiency – on what we can do with data. The second wave focuses on responsibility – on what we should and shouldn’t do. Irresponsible use of data science can cause harm on an unprecedented scale. Algorithmic changes in search engines can sway elections and incite violence; irreproducible results can influence global economic policy; models based on biased data can legitimize and amplify racist policies in the criminal justice system; algorithmic hiring practices can silently and scalably violate equal opportunity laws, exposing companies to lawsuits and reinforcing the feedback loops that lead to lack of diversity. Therefore, as we develop and deploy data science methods, we are compelled to think about the effects these methods have on individuals, population groups, and on society at large.
Responsible Data Science is a technical course that tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection. The course is developed and taught by Julia Stoyanovich, Assistant Professor at the Center for Data Science and at the Tandon School of Engineering, and member of the NYC Automated Decision Systems Task Force.
Prerequisites: Introduction to Data Science, Introduction to Computer Science, or similar courses.
Lab Materials: Labs will be conducted using Jupyter Hub. Students should use their NYU NetID to log in, and click the “Assignments” tab to find the material for each week. After lab, links to the notebook for each class will be included on this page.
This weekly schedule is tentative and is subject to change.
Date | Topic | Materials | Assignments | |
---|---|---|---|---|
Jan 27 | Lecture: Introduction and background. Algorithmic fairness. Topics: Course outline, aspects of responsibility in data science through recent examples. Fairness in classification. The importance of a socio-technical perspective: stakeholders and trade-offs. Reading: “Bias in Computer Systems”, Friedman and Nissenbaum (1996) ACM DL “Machine Bias”, Angwin, Larson, Mattu, Kirchner (2016) ProPublica “Data, Responsibly”, Abiteboul and Stoyanovich (2015) ACM SIGMOD blog “Fairness through awareness”, Dwork, Hardt, Pitassi, Reingold, Zemel (2012) ACM DL “On the (im)possibility of fairness”, Friedler, Scheidegger, Venkatasubramanian (2016) arXiv |
slides | ||
Jan 27 | Lab: Intro to Jupyter Hub, ProPublica’s Machine Bias | notebook | ||
Feb 3 | Lecture: Algorithmic fairness continued. Topics: Fairness in risk assessment. Fairness in ranking. Reading: “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments”, Chouldechova (2017) arXiv “Inherent Trade-Offs in the Fair Determination of Risk Scores”, J. Kleinberg, S. Mullainathan, M. Raghavan (2017) pdf “Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions”, Mitchell, Porash, Barocas (2018) arXiv “Dissecting racial bias in an algorithm used to manage the health of populations”, Obermeyer, Powers, Vogel, Mullainathan(2019) Science |
slides | ||
Feb 3 | Lab: IBM’s AI Fairness 360 toolkit Reading: “AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias”, R. Bellamy et al. (2018) pdf “Data preprocessing techniques for classification without discrimination”, F. Kamiran and T. Calders (2012) pdf “Certifying and removing disparate impact”, M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) pdf |
notebook | ||
Feb 10 | Lecture: Data cleaning Topics: Overview of data cleaning Reading: “Profiling relational data: a survey”, Abedjan, Golab, Naumann (2015) pdf “Quantitative data cleaning for large databases”, Hellerstein (2008) pdf |
slides | ||
Feb 10 | Lab: IBM’s AI Fairness 360 toolkit Reading: “FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions”, S. Schelter, Y. He, J. Khilnani, and J. Stoyanovich (2019) pdf |
notebook | HW1 assigned | |
Feb 17 | No class, university holiday | |||
Feb 24 | Lecture (part 1): Fairness and causality Topics: Counterfactual fairness “The long road to fairer algorithms”, M. Kushner, J. Loftus (2020) Nature “Counterfactual fairness”, M. Kusner, J. Loftus, C. Russell, R. Silva(2017) pdf Lecture (part 2): Data profiling Topics: Types of data profiling tasks, overview of the relational model |
slides(1) slides(2) | HW1 due |
|
Feb 24 | Lab: Data profiling and data cleaning course project discussion |
notebook | project assigned | |
Mar 2 | Lecture (part 1): Data profiling continued Topics: Discovering uniques, frequent itemset and association rule mining Lecture (part 2): Anonymity and privacy Topics: Overview of responsible data sharing. Anonymization techniques; the limits of anonymization. Harms beyond re-identification. Reading: “The Belmont Report” (1979) pdf “Critical questions for Big Data”, danah boyd and Kate Crawford (2012) pdf |
slides(1) slides(2) | ||
Mar 2 | Lab: Data profiling and data cleaning | notebook | ||
Mar 9 | Lecture: Anonymity and privacy Topics: Differential privacy; privacy-preserving synthetic data generation; exploring the privacy / utility trade-off. Reading: “A firm foundation for private data analysis”, C. Dwork (2011) ACM DL “Can a set of equations keep U.S. census data private?”, J. Mervis (2019) Science |
slides | project proposal due | |
Mar 9 | Lab: Data Synthesizer Reading: “DataSynthesizer: Privacy-Preserving Synthetic Datasets”, Ping, Stoyanovich, Howe (2017) ACM DL |
notebook | HW2 assigned | |
Mar 16 | No class, university holiday | |||
Mar 23 | Lecture: Ethical frameworks Reading: “The Belmont Report” (1979) pdf “The Menlo Report” (2012) pdf “Chapter 6: Ethics. Bit by Bit: Social Research in the Digital Age”, Matthew Salganik (2017) online |
slides | ||
Mar 23 | Lab: Ethical frameworks | |||
Mar 30 | Lecture: Transparency Topics: Auditing black-box models; explainable machine learning. Reading: “Why should I trust you? Explaining the predictions of any classifier”, Ribeiro, Singh, Guestrin (2016) pdf “Algorithmic transparency via quantiative input influence: theory and experiments with learning systems”, Datta, Sen, Zick (2016) pdf “A unified approach to interpreting model predictions”, Lundberg and Lee (2017) pdf |
slides | HW2 due | |
Mar 30 | Lab: LIME | notebook | ||
Apr 6 | Lecture: Transparency Topics: Discrimination in online ad delivery. Reading: “Automated Experiments on Ad Privacy Settings”, Datta, Tschantz, Datta (2015) pdf |
slides | HW3 assigned; project report draft due (extended to Apr 8) |
|
Apr 6 | Lab: SHAP | notebook | ||
Apr 13 | Lecture: Transparency Topics: Discrimination in online ad delivery, continued. Reading: “Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes”, Ali, Sapiezynski, Bogen, Korolova, Mislove, Rieke (2019) pdf “Facebook has been charged with housing discrimination by the US government”, Russell Brandom for The Verge, Mar 28, 2019 read online |
slides | ||
Apr 13 | Lab: Course project discussion: working through an example of an ADS | notebook | ||
Apr 20 | Lecture: RDS in practice: Guest lecture by Robert Cheetham, President and CEO of Azavea Topics: Project selection Reading: “How Azavea selects projects”, Robert Cheetham (2019) link “HunchLab: Under the hood” (2015) link “Why we sold HunchLab” (2019) link |
slides | HW3 due | |
Apr 20 | Lab: Final exam review | see slides on NYU Classes | ||
Apr 27 | Lecture: Interpretability Topics: What is interpretability? Reading: “The Intuitive Appeal of Explainable Machines”, A. Selbst and S. Barocas (2018) SSRN “Nutritional Labels for Data and Models”, J. Stoyanovich and B. Howe (2019) pdf “The Imperative of Interpretable Machines”, J. Stoyanovich, J. Van Bavel, T. West (2020) link |
slides | Final exam assigned (take-home) | |
Apr 27 | Lab: Course project discussion: working through an example of an ADS | notebook | ||
May 4 | Lecture: Legal and regulatory frameworks Topics: Data protection, algorithmic impact assessment, regulating Automated Decision Systems (ADS) and AI Reading: GDPR link Canadian Directive on Automated Decision-Making link NYC ADS Task Force Report pdf “Disparate Impact in Big Data Policing”, A. Selbst (2017) SSRN “Ensuring a Future that Advances Equity in Algoritmic Employment Decisions”, J. Yang (2019) pdf |
slides | ||
May 4 | Lab: Slack, course project discussion | |||
May 11 | Lecture: Project presentations | project report due | ||
May 11 | Lab: Project presentations |