0.5.0 - 2024-06-02¶
📝 SIGMOD Demo Paper¶
- Virny demonstration paper was accepted and published at SIGMOD! 🎉🥳 Explore our work in the ACM digital library.
@inproceedings{herasymuk2024responsible,
title={Responsible Model Selection with Virny and VirnyView},
author={Herasymuk, Denys and Arif Khan, Falaah and Stoyanovich, Julia},
booktitle={Companion of the 2024 International Conference on Management of Data},
pages={488--491},
year={2024}
}
📖 Glossary¶
- Our documentation was extended with a new "Glossary" section that provides:
- Explanation of our approach for measuring model stability and uncertainty
- Detailed description of the overall and disparity metrics computed by Virny
⚙️ More Features for Experimental Studies¶
-
Metric computation interfaces were extended with a new optional parameter -
with_predict_proba
. If set toFalse
, Virny computes metrics only based on prediction labels, NOT prediction probabilities. Specifically, it can be useful in the case when a model cannot provide probabilities for its predictions. Default:True
. -
Metric computation interfaces were enabled to measure the computation runtime in minutes for each model. It can be particularly useful for a large experimental studies with thousands of pipelines to estimate their runtimes and to benchmark models between each other.
🗃 New Benchmark Fair-ML Datasets¶
-
DiabetesDataset2019. A data loader for the Diabetes 2019 dataset that contains sensitive attributes among feature columns.
- Target: Binary classify whether a person has a diabetes disease or not.
- Source and broad description: https://www.kaggle.com/datasets/tigganeha4/diabetes-dataset-2019/data.
-
GermanCreditDataset. A data loader for the German Credit dataset that contains sensitive attributes among feature columns.
- Target: Binary classify people described by a set of attributes as good or bad credit risks.
- Source: https://github.com/tailequy/fairness_dataset/blob/main/experiments/data/german_data_credit.csv.
- General description and analysis: https://arxiv.org/pdf/2110.00530.pdf (Section 3.1.3).
- Broad description: https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data.
-
BankMarketingDataset. A data loader for the Bank Marketing dataset that contains sensitive attributes among feature columns.
- Target: The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit.
- Source: https://github.com/tailequy/fairness_dataset/blob/main/experiments/data/bank-full.csv.
- General description and analysis: https://arxiv.org/pdf/2110.00530.pdf (Section 3.1.5).
- Broad description: https://archive.ics.uci.edu/dataset/222/bank+marketing.
-
CardiovascularDiseaseDataset. A data loader for the Cardiovascular Disease dataset that contains sensitive attributes among feature columns.
- Target: Binary classify whether a person has a cardiovascular disease or not.
- Source and broad description: https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset.