0.5.0 - 2024-06-02¶

PyPI
GitHub

📝 SIGMOD Demo Paper¶

Virny demonstration paper was accepted and published at SIGMOD! 🎉🥳 Explore our work in the ACM digital library.

@inproceedings{herasymuk2024responsible,
  title={Responsible Model Selection with Virny and VirnyView},
  author={Herasymuk, Denys and Arif Khan, Falaah and Stoyanovich, Julia},
  booktitle={Companion of the 2024 International Conference on Management of Data},
  pages={488--491},
  year={2024}
}

📖 Glossary¶

Our documentation was extended with a new "Glossary" section that provides:
- Explanation of our approach for measuring model stability and uncertainty
- Detailed description of the overall and disparity metrics computed by Virny

⚙️ More Features for Experimental Studies¶

Metric computation interfaces were extended with a new optional parameter - with_predict_proba. If set to False, Virny computes metrics only based on prediction labels, NOT prediction probabilities. Specifically, it can be useful in the case when a model cannot provide probabilities for its predictions. Default: True.
Metric computation interfaces were enabled to measure the computation runtime in minutes for each model. It can be particularly useful for a large experimental studies with thousands of pipelines to estimate their runtimes and to benchmark models between each other.

🗃 New Benchmark Fair-ML Datasets¶

DiabetesDataset2019. A data loader for the Diabetes 2019 dataset that contains sensitive attributes among feature columns.
- Target: Binary classify whether a person has a diabetes disease or not.
- Source and broad description: https://www.kaggle.com/datasets/tigganeha4/diabetes-dataset-2019/data.
GermanCreditDataset. A data loader for the German Credit dataset that contains sensitive attributes among feature columns.
- Target: Binary classify people described by a set of attributes as good or bad credit risks.
- Source: https://github.com/tailequy/fairness_dataset/blob/main/experiments/data/german_data_credit.csv.
- General description and analysis: https://arxiv.org/pdf/2110.00530.pdf (Section 3.1.3).
- Broad description: https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data.
BankMarketingDataset. A data loader for the Bank Marketing dataset that contains sensitive attributes among feature columns.
- Target: The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit.
- Source: https://github.com/tailequy/fairness_dataset/blob/main/experiments/data/bank-full.csv.
- General description and analysis: https://arxiv.org/pdf/2110.00530.pdf (Section 3.1.5).
- Broad description: https://archive.ics.uci.edu/dataset/222/bank+marketing.
CardiovascularDiseaseDataset. A data loader for the Cardiovascular Disease dataset that contains sensitive attributes among feature columns.
- Target: Binary classify whether a person has a cardiovascular disease or not.
- Source and broad description: https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset.