Multiple Models Interface For Multiple Test Sets¶
In this example, we are going to conduct a deep performance profiling for 2 models under multiple test sets. It turns out particularly effective in case we want to test models on different test sets, but to avoid an overhead with retraining a bootstrap with 50-200 estimators.
For that, we will use compute_metrics_with_multiple_test_sets
interface that will run metric computation for multiple models and test each model using multiple test sets.
Import dependencies¶
import os
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from virny.user_interfaces.multiple_models_with_multiple_test_sets_api import compute_metrics_with_multiple_test_sets
from virny.utils.custom_initializers import create_config_obj, create_models_metrics_dct_from_database_df
from virny.preprocessing.basic_preprocessing import preprocess_dataset
from virny.datasets import CompasWithoutSensitiveAttrsDataset
Initialize Input Variables¶
Based on the library flow, we need to create 3 input objects for a user interface:
-
A config yaml that is a file with configuration parameters for different user interfaces for metric computation.
-
A dataset class that is a wrapper above the userβs raw dataset that includes its descriptive attributes like a target column, numerical columns, categorical columns, etc. This class must be inherited from the BaseDataset class, which was created for user convenience.
-
Finally, a models config that is a Python dictionary, where keys are model names and values are initialized models for analysis. This dictionary helps conduct audits for different analysis modes and analyze different types of models.
TEST_SET_FRACTION = 0.2
DATASET_SPLIT_SEED = 42
Create a config object¶
compute_metrics_with_multiple_test_sets
interface requires that your yaml file includes the following parameters:
-
dataset_name: str, a name of your dataset; it will be used to name files with metrics.
-
bootstrap_fraction: float, the fraction from a train set in the range [0.0 - 1.0] to fit models in bootstrap (usually more than 0.5).
-
random_state: int, a seed to control the randomness of the whole model evaluation pipeline.
-
n_estimators: int, the number of estimators for bootstrap to compute subgroup stability metrics.
-
computation_mode: str, 'default' or 'error_analysis'. Name of the computation mode. When a default computation mode measures metrics for sex_priv and sex_dis, an
error_analysis
mode measures metrics for (sex_priv, sex_priv_correct, sex_priv_incorrect) and (sex_dis, sex_dis_correct, sex_dis_incorrect). Therefore, a user can analyze how a model is certain about its incorrect predictions. -
sensitive_attributes_dct: dict, a dictionary where keys are sensitive attribute names (including intersectional attributes), and values are disadvantaged values for these attributes. Intersectional attributes must include '&' between sensitive attributes. You do not need to specify disadvantaged values for intersectional groups since they will be derived from disadvantaged values in sensitive_attributes_dct for each separate sensitive attribute in this intersectional pair.
Note that disadvantaged value in a sensitive attribute dictionary must be the same as in the original dataset. For example, when distinct values of the sex column in the original dataset are 'F' and 'M', and after pre-processing they became 0 and 1 respectively, you still need to set a disadvantaged value as 'F' or 'M' in the sensitive attribute dictionary.
ROOT_DIR = os.getcwd()
config_yaml_path = os.path.join(ROOT_DIR, 'experiment_config.yaml')
config_yaml_content = \
"""dataset_name: COMPAS_Without_Sensitive_Attributes
bootstrap_fraction: 0.8
random_state: 42
n_estimators: 50 # Better to input the higher number of estimators than 100; this is only for this use case example
computation_mode: error_analysis
sensitive_attributes_dct: {'sex': 1, 'race': 'African-American', 'sex&race': None}
"""
with open(config_yaml_path, 'w', encoding='utf-8') as f:
f.write(config_yaml_content)
config = create_config_obj(config_yaml_path=config_yaml_path)
Create a Dataset class¶
Based on the BaseDataset class, your dataset class should include the following attributes:
-
Obligatory attributes: dataset, target, features, numerical_columns, categorical_columns
-
Optional attributes: X_data, y_data, columns_with_nulls
For more details, please refer to the library documentation.
data_loader = CompasWithoutSensitiveAttrsDataset()
data_loader.X_data[data_loader.X_data.columns[:5]].head()
juv_fel_count | juv_misd_count | juv_other_count | priors_count | age_cat_25 - 45 | |
---|---|---|---|---|---|
0 | 0.0 | -2.340451 | 1.0 | -15.010999 | 1 |
1 | 0.0 | 0.000000 | 0.0 | 0.000000 | 1 |
2 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0 |
3 | 0.0 | 0.000000 | 0.0 | 6.000000 | 1 |
4 | 0.0 | 0.000000 | 0.0 | 7.513697 | 1 |
column_transformer = ColumnTransformer(transformers=[
('categorical_features', OneHotEncoder(handle_unknown='ignore', sparse_output=False), data_loader.categorical_columns),
('numerical_features', StandardScaler(), data_loader.numerical_columns),
])
base_flow_dataset = preprocess_dataset(data_loader=data_loader,
column_transformer=column_transformer,
sensitive_attributes_dct=config.sensitive_attributes_dct,
test_set_fraction=TEST_SET_FRACTION,
dataset_split_seed=DATASET_SPLIT_SEED)
Create a models config¶
models_config is a Python dictionary, where keys are model names and values are initialized models for analysis
models_config = {
'DecisionTreeClassifier': DecisionTreeClassifier(criterion='gini',
max_depth=20,
max_features=0.6,
min_samples_split=0.1),
'RandomForestClassifier': RandomForestClassifier(max_depth=4,
max_features=0.6,
min_samples_leaf=1,
n_estimators=50),
}
Subgroup Metric Computation¶
After that we need to input the BaseFlowDataset object, models config, and config yaml to a metric computation interface and execute it. The interface uses subgroup analyzers to compute different sets of metrics for each privileged and disadvantaged group. As for now, our library supports Subgroup Variance Analyzer and Subgroup Error Analyzer, but it is easily extensible to any other analyzers. When the variance and error analyzers complete metric computation, their metrics are combined, returned in a matrix format, and stored in a file if defined.
import os
from dotenv import load_dotenv
from pymongo import MongoClient
load_dotenv(os.path.join(ROOT_DIR, 'secrets.env')) # Take environment variables from .env
# Provide the mongodb atlas url to connect python to mongodb using pymongo
CONNECTION_STRING = os.getenv("CONNECTION_STRING")
# Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
client = MongoClient(CONNECTION_STRING)
collection = client[os.getenv("DB_NAME")]['preprocessing_results']
def db_writer_func(run_models_metrics_df, collection=collection):
run_models_metrics_df.columns = run_models_metrics_df.columns.str.lower() # Rename Pandas columns to lower case
collection.insert_many(run_models_metrics_df.to_dict('records'))
import uuid
custom_table_fields_dct = {
'session_uuid': str(uuid.uuid4()),
'preprocessing_techniques': 'one hot encoder and scaler',
}
print('Current session uuid: ', custom_table_fields_dct['session_uuid'])
Current session uuid: 8d31eaab-5d6d-4830-9b23-c29355efa90b
extra_test_sets_lst = [(base_flow_dataset.X_test, base_flow_dataset.y_test, base_flow_dataset.init_sensitive_attrs_df)]
compute_metrics_with_multiple_test_sets(dataset=base_flow_dataset,
extra_test_sets_lst=extra_test_sets_lst,
config=config,
models_config=models_config,
custom_tbl_fields_dct=custom_table_fields_dct,
db_writer_func=db_writer_func)
Analyze multiple models: 0%|[31m [0m| 0/2 [00:00<?, ?it/s]
Classifiers testing by bootstrap: 100%|[34mββββββββββ[0m| 50/50 [00:00<00:00, 112.87it/s]
Analyze multiple models: 50%|[31mβββββ [0m| 1/2 [00:06<00:06, 6.70s/it]
Classifiers testing by bootstrap: 100%|[34mββββββββββ[0m| 50/50 [00:03<00:00, 16.63it/s]
Analyze multiple models: 100%|[31mββββββββββ[0m| 2/2 [00:16<00:00, 8.05s/it]
def read_model_metric_dfs_from_db(collection, session_uuid):
cursor = collection.find({'session_uuid': session_uuid})
records = []
for record in cursor:
del record['_id']
records.append(record)
model_metric_dfs = pd.DataFrame(records)
# Capitalize column names to be consistent across the whole library
new_column_names = []
for col in model_metric_dfs.columns:
new_col_name = '_'.join([c.capitalize() for c in col.split('_')])
new_column_names.append(new_col_name)
model_metric_dfs.columns = new_column_names
return model_metric_dfs
model_metric_dfs = read_model_metric_dfs_from_db(collection, custom_table_fields_dct['session_uuid'])
models_metrics_dct = create_models_metrics_dct_from_database_df(model_metric_dfs)
client.close()