Skip to content

Multiple Models Interface With Inprocessor

In this example, we are going to audit one inprocessor from AIF360 for stability and fairness, visualize metrics, and create an analysis report. For that, we will use compute_metrics_with_config interface that can compute metrics for multiple models. Thus, we will need to do the next steps:

  • Initialize input variables

  • Compute subgroup metrics

  • Make group metrics composition

  • Create metrics visualizations and an analysis report

Import dependencies

import os
from pprint import pprint
from datetime import datetime, timezone

from sklearn.linear_model import LogisticRegression

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

from virny.utils.custom_initializers import create_config_obj, read_model_metric_dfs
from virny.user_interfaces.multiple_models_api import compute_metrics_with_config
from virny.preprocessing.basic_preprocessing import preprocess_dataset
from virny.custom_classes.metrics_visualizer import MetricsVisualizer
from virny.custom_classes.metrics_composer import MetricsComposer

Initialize Input Variables

Based on the library flow, we need to create 3 input objects for a user interface:

  • A config yaml that is a file with configuration parameters for different user interfaces for metrics computation.

  • A dataset class that is a wrapper above the user’s raw dataset that includes its descriptive attributes like a target column, numerical columns, categorical columns, etc. This class must be inherited from the BaseDataset class, which was created for user convenience.

  • Finally, a models config that is a Python dictionary, where keys are model names and values are initialized models for analysis. This dictionary helps conduct audits for different analysis modes and analyze different types of models.

DATASET_SPLIT_SEED = 42
MODELS_TUNING_SEED = 42
TEST_SET_FRACTION = 0.2

Create a config object

compute_metrics_with_config interface requires that your yaml file includes the following parameters:

  • dataset_name: str, a name of your dataset; it will be used to name files with metrics.

  • bootstrap_fraction: float, the fraction from a train set in the range [0.0 - 1.0] to fit models in bootstrap (usually more than 0.5).

  • random_state: int, a seed to control the randomness of the whole model evaluation pipeline.

  • computation_mode: str, 'default' or 'error_analysis'. Name of the computation mode. When a default computation mode measures metrics for sex_priv and sex_dis, an error_analysis mode measures metrics for (sex_priv, sex_priv_correct, sex_priv_incorrect) and (sex_dis, sex_dis_correct, sex_dis_incorrect). Therefore, a user can analyze how a model is certain about its incorrect predictions.

  • n_estimators: int, the number of estimators for bootstrap to compute subgroup stability metrics.

  • sensitive_attributes_dct: dict, a dictionary where keys are sensitive attribute names (including intersectional attributes), and values are disadvantaged values for these attributes. Intersectional attributes must include '&' between sensitive attributes. You do not need to specify disadvantaged values for intersectional groups since they will be derived from disadvantaged values in sensitive_attributes_dct for each separate sensitive attribute in this intersectional pair.

ROOT_DIR = os.path.join('docs', 'examples')
config_yaml_path = os.path.join(ROOT_DIR, 'experiment_config.yaml')
config_yaml_content = """
dataset_name: Law_School
bootstrap_fraction: 0.8
random_state: 42
computation_mode: error_analysis
n_estimators: 30  # Better to input the higher number of estimators than 100; this is only for this use case example
sensitive_attributes_dct: {'male': '0', 'race': 'Non-White', 'male&race': None}
"""

with open(config_yaml_path, 'w', encoding='utf-8') as f:
    f.write(config_yaml_content)
config = create_config_obj(config_yaml_path=config_yaml_path)
SAVE_RESULTS_DIR_PATH = os.path.join(ROOT_DIR, 'results', f'{config.dataset_name}_Metrics_{datetime.now(timezone.utc).strftime("%Y%m%d__%H%M%S")}')

Preprocess the dataset and create a BaseFlowDataset class

from virny.datasets import LawSchoolDataset

data_loader = LawSchoolDataset()
data_loader.X_data[data_loader.X_data.columns[:5]].head()
decile1b decile3 lsat ugpa zfygpa
0 10.0 10.0 44.0 3.5 1.33
1 5.0 4.0 29.0 3.5 -0.11
2 8.0 7.0 37.0 3.4 0.63
3 8.0 7.0 43.0 3.3 0.67
4 3.0 2.0 41.0 3.3 -0.67
column_transformer = ColumnTransformer(transformers=[
    ('categorical_features', OneHotEncoder(handle_unknown='ignore', sparse_output=False), data_loader.categorical_columns),
    ('numerical_features', StandardScaler(), data_loader.numerical_columns),
])
# Create a binary race column for in-processing since aif360 inprocessors use a sensitive attribute during their learning.
data_loader.X_data['race_binary'] = data_loader.X_data['race'].apply(lambda x: 1 if x == 'White' else 0)

base_flow_dataset = preprocess_dataset(data_loader=data_loader,
                                       column_transformer=column_transformer,
                                       sensitive_attributes_dct=config.sensitive_attributes_dct,
                                       test_set_fraction=TEST_SET_FRACTION,
                                       dataset_split_seed=DATASET_SPLIT_SEED)
base_flow_dataset.X_train_val['race_binary'] = data_loader.X_data.loc[base_flow_dataset.X_train_val.index, 'race_binary']
base_flow_dataset.X_test['race_binary'] = data_loader.X_data.loc[base_flow_dataset.X_test.index, 'race_binary']

Define an inprocessor and create a wrapper for it

To use inprocessors from AIF360 together with Virny, we need to create a wrapper to use it as a basic model in the models_config.

A wrapper should include the following methods:

  • fit(self, X, y): fits an inprocessor based on X and y pandas dataframes. Returns self.

  • predict_proba(self, X): predicts using the fitted inprocessor based on X features pandas dataframe. Returns probabilities for ZERO class. These probabilities will be used by Virny in the downstream metric computation.

  • predict(self, X): predicts using the fitted inprocessor based on X features pandas dataframe. Returns labels for each sample.

  • copy(self) and deepcopy(self, memo): methods, which will be used during copy.copy(inprocessor_wrapper) and copy.deepcopy(inprocessor_wrapper). Return a new instance of inprocessor's wrapper.

  • get_params(self): returns parameters of the wrapper. Alignment with sklearn API.

import copy
import numpy as np

from aif360.algorithms.inprocessing import ExponentiatedGradientReduction
from virny.custom_classes.base_inprocessing_wrapper import BaseInprocessingWrapper
from virny.utils.postprocessing_intervention_utils import construct_binary_label_dataset_from_df


class ExpGradientReductionWrapper(BaseInprocessingWrapper):
    """
    A wrapper for fair inprocessors from aif360. The wrapper aligns fit, predict, and predict_proba methods
    to be compatible with sklearn models.

    Parameters
    ----------
    inprocessor
        An initialized inprocessor from aif360.
    sensitive_attr_for_intervention
        A sensitive attribute name to use in the fairness in-processing intervention.

    """

    def __init__(self, estimator, sensitive_attr_for_intervention):
        self.sensitive_attr_for_intervention = sensitive_attr_for_intervention
        self.estimator = estimator
        self.inprocessor = ExponentiatedGradientReduction(estimator=self.estimator,
                                                          constraints='DemographicParity',
                                                          drop_prot_attr=True)

    def __copy__(self):
        new_estimator = copy.copy(self.estimator)
        return ExpGradientReductionWrapper(estimator=new_estimator,
                                           sensitive_attr_for_intervention=self.sensitive_attr_for_intervention)

    def __deepcopy__(self, memo):
        new_estimator = copy.deepcopy(self.estimator)
        return ExpGradientReductionWrapper(estimator=new_estimator,
                                           sensitive_attr_for_intervention=self.sensitive_attr_for_intervention)

    def get_params(self):
        return {'sensitive_attr_for_intervention': self.sensitive_attr_for_intervention}

    def set_params(self, random_state):
        self.estimator.set_params(random_state=random_state)
        self.inprocessor = ExponentiatedGradientReduction(estimator=self.estimator,
                                                          constraints='DemographicParity',
                                                          drop_prot_attr=True)

    def fit(self, X, y):
        train_binary_dataset = construct_binary_label_dataset_from_df(X_sample=X,
                                                                      y_sample=y,
                                                                      target_column='target',
                                                                      sensitive_attribute=self.sensitive_attr_for_intervention)
        # Fit an inprocessor
        self.inprocessor.fit(train_binary_dataset)
        return self

    def predict_proba(self, X):
        y_empty = np.zeros(X.shape[0])
        test_binary_dataset = construct_binary_label_dataset_from_df(X_sample=X,
                                                                     y_sample=y_empty,
                                                                     target_column='target',
                                                                     sensitive_attribute=self.sensitive_attr_for_intervention)
        test_dataset_pred = self.inprocessor.predict(test_binary_dataset)
        # Set 1.0 since ExponentiatedGradientReduction can return probabilities slightly higher than 1.0.
        # This can cause Infinity values for entropy.
        test_dataset_pred.scores[test_dataset_pred.scores > 1.0] = 1.0
        # Return 1 - test_dataset_pred.scores since scores are probabilities for label 1, not for label 0
        return 1 - test_dataset_pred.scores

    def predict(self, X):
        y_empty = np.zeros(shape=X.shape[0])
        test_binary_dataset = construct_binary_label_dataset_from_df(X_sample=X,
                                                                     y_sample=y_empty,
                                                                     target_column='target',
                                                                     sensitive_attribute=self.sensitive_attr_for_intervention)
        test_dataset_pred = self.inprocessor.predict(test_binary_dataset)
        return test_dataset_pred.labels
# Define a name of a sensitive attribute for the in-processing intervention.
# Note that in the above wrapper, 1 is used as a favorable label, and 0 is used as an unfavorable label.
sensitive_attr_for_intervention = 'race_binary'

# Define an estimator
estimator = LogisticRegression(solver='lbfgs', max_iter=1000)
models_config = {
    'ExponentiatedGradientReduction': ExpGradientReductionWrapper(estimator=estimator,
                                                                  sensitive_attr_for_intervention=sensitive_attr_for_intervention)
}

Subgroup Metrics Computation

After the variables are input to a user interface, the interface uses subgroup analyzers to compute different sets of metrics for each privileged and disadvantaged subgroup. As for now, our library supports Subgroup Variance Analyzer and Subgroup Error Analyzer, but it is easily extensible to any other analyzers. When the variance and error analyzers complete metrics computation, their metrics are combined, returned in a matrix format, and stored in a file if defined.

metrics_dct = compute_metrics_with_config(dataset=base_flow_dataset,
                                          config=config,
                                          models_config=models_config,
                                          save_results_dir_path=SAVE_RESULTS_DIR_PATH,
                                          notebook_logs_stdout=True)
Analyze multiple models:   0%|          | 0/1 [00:00<?, ?it/s]



Classifiers testing by bootstrap:   0%|          | 0/30 [00:00<?, ?it/s]

Look at several columns in top rows of computed metrics

sample_model_metrics_df = metrics_dct[list(models_config.keys())[0]]
sample_model_metrics_df[sample_model_metrics_df.columns[:6]].head(20)
Metric overall male_priv male_priv_correct male_priv_incorrect male_dis
0 Aleatoric_Uncertainty 0.005905 0.004883 0.003296 0.021364 0.007256
1 IQR 0.010355 0.009922 0.008073 0.029115 0.010926
2 Mean_Prediction 0.024633 0.021842 0.015440 0.088320 0.028322
3 Overall_Uncertainty 0.020169 0.018285 0.012946 0.073729 0.022659
4 Statistical_Bias 0.098458 0.089847 0.004210 0.979146 0.109838
5 Std 0.009615 0.008868 0.006229 0.036276 0.010603
6 Epistemic_Uncertainty 0.014264 0.013402 0.009650 0.052365 0.015403
7 Label_Stability 0.989087 0.989696 0.992222 0.963462 0.988281
8 Jitter 0.008198 0.007553 0.005556 0.028294 0.009050
9 TPR 0.990612 0.991163 1.000000 0.000000 0.989861
10 TNR 0.141204 0.133028 1.000000 0.000000 0.149533
11 PPV 0.908711 0.918534 1.000000 0.000000 0.895642
12 FNR 0.009388 0.008837 0.000000 1.000000 0.010139
13 FPR 0.858796 0.866972 0.000000 1.000000 0.850467
14 Accuracy 0.902404 0.912162 1.000000 0.000000 0.889509
15 F1 0.947895 0.953468 1.000000 0.000000 0.940397
16 Selection-Rate 0.976923 0.979730 0.986574 0.908654 0.973214
17 Sample_Size 4160.000000 2368.000000 2160.000000 208.000000 1792.000000

Group Metrics Composition

Metrics Composer is responsible for this second stage of the model audit. Currently, it computes our custom group fairness and stability metrics, but extending it for new group metrics is very simple. We noticed that more and more group metrics have appeared during the last decade, but most of them are based on the same subgroup metrics. Hence, such a separation of subgroup and group metrics computation allows one to experiment with different combinations of subgroup metrics and avoid subgroup metrics recomputation for a new set of grouped metrics.

models_metrics_dct = read_model_metric_dfs(SAVE_RESULTS_DIR_PATH, model_names=list(models_config.keys()))
metrics_composer = MetricsComposer(models_metrics_dct, config.sensitive_attributes_dct)

Compute composed metrics

models_composed_metrics_df = metrics_composer.compose_metrics()
models_composed_metrics_df
Metric male race male&race Model_Name
0 Accuracy_Difference -0.022653 -0.178877 -0.157307 ExponentiatedGradientReduction
1 Aleatoric_Uncertainty_Difference 0.002373 0.018372 0.021097 ExponentiatedGradientReduction
2 Aleatoric_Uncertainty_Ratio 1.485922 6.916304 5.985519 ExponentiatedGradientReduction
3 Epistemic_Uncertainty_Difference 0.002001 0.009870 0.014769 ExponentiatedGradientReduction
4 Epistemic_Uncertainty_Ratio 1.149317 1.773535 2.128039 ExponentiatedGradientReduction
5 Equalized_Odds_FNR 0.001302 0.001559 0.003110 ExponentiatedGradientReduction
6 Equalized_Odds_FPR -0.016505 0.076428 0.045638 ExponentiatedGradientReduction
7 IQR_Difference 0.001005 0.010219 0.012572 ExponentiatedGradientReduction
8 Jitter_Difference 0.001498 0.009698 0.013220 ExponentiatedGradientReduction
9 Label_Stability_Ratio 0.998571 0.988954 0.984495 ExponentiatedGradientReduction
10 Label_Stability_Difference -0.001415 -0.010944 -0.015355 ExponentiatedGradientReduction
11 Overall_Uncertainty_Difference 0.004374 0.028242 0.035866 ExponentiatedGradientReduction
12 Overall_Uncertainty_Ratio 1.239210 2.780147 3.070293 ExponentiatedGradientReduction
13 Statistical_Parity_Difference -0.006515 -0.011852 -0.014432 ExponentiatedGradientReduction
14 Disparate_Impact 0.993350 0.987890 0.985245 ExponentiatedGradientReduction
15 Std_Difference 0.001734 0.009583 0.013289 ExponentiatedGradientReduction
16 Std_Ratio 1.195550 2.175074 2.552202 ExponentiatedGradientReduction
17 Equalized_Odds_TNR 0.016505 -0.076428 -0.045638 ExponentiatedGradientReduction
18 Equalized_Odds_TPR -0.001302 -0.001559 -0.003110 ExponentiatedGradientReduction

Metrics Visualization and Reporting

Metric Visualizer allows us to build static visualizations for the computed metrics. It unifies different preprocessing methods for the computed metrics and creates various data formats required for visualizations. Hence, users can simply call methods of the MetricsVisualizer class and get custom plots for diverse metric analysis.

visualizer = MetricsVisualizer(models_metrics_dct, models_composed_metrics_df, config.dataset_name,
                               model_names=list(models_config.keys()),
                               sensitive_attributes_dct=config.sensitive_attributes_dct)
visualizer.create_overall_metrics_bar_char(
    metric_names=['Accuracy', 'F1', 'TPR', 'TNR', 'PPV', 'Selection-Rate'],
    plot_title="Accuracy Metrics"
)
visualizer.create_overall_metrics_bar_char(
    metric_names=['Aleatoric_Uncertainty', 'Epistemic_Uncertainty', 'Std', 'IQR', 'Jitter'],
    plot_title="Stability and Uncertainty Metrics"
)