{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import math\n",
"import numpy as np\n",
"import pandas as pd\n",
"from scipy.stats import norm\n",
"# NOTE TO PUT UTILITY.PY FILE IN THE SAME FOLDER OF THIS NOTEBOOK\n",
"from nutrition_label_utility import *\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data and Automated Decision System (ADS)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### College Ranking\n",
"CS Rankings (CSR) (https://csrankings.org), with additional attributes from the NRC assessment dataset (http://www.nap.edu/rdp ). This dataset has the following attributes: \n",
"- Average Count (CSR) computes the geometric mean of the adjusted number of publications in each area by institution\n",
"- Faculty (CSR) is the number of faculty in the department\n",
"- GRE (NRC) is the average GRE scores (2004-2006)\n",
"- Department size (CSR) is a binary attribute derived from Faculty. Small department has the number of faculty less or equal than 30.\n",
"- Region (NRC) is one of Northeast(NE), Midwest(MW), South Atlantic(SA), South Central(SC), West(W) regions in the US\n",
"- Pub Count (NRC) is the average number of publications per faculty (2000-2006)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Table 1"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
College Name
\n",
"
Average Count
\n",
"
Faculty
\n",
"
GRE
\n",
"
Department Size
\n",
"
Regional Code
\n",
"
Pub Count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Carnegie Mellon University
\n",
"
18.3
\n",
"
122
\n",
"
791.376
\n",
"
Large
\n",
"
NE
\n",
"
2.319
\n",
"
\n",
"
\n",
"
1
\n",
"
Massachusetts Institute of Technology
\n",
"
15.0
\n",
"
64
\n",
"
771.894
\n",
"
Large
\n",
"
NE
\n",
"
2.667
\n",
"
\n",
"
\n",
"
2
\n",
"
Stanford University
\n",
"
14.3
\n",
"
55
\n",
"
800.000
\n",
"
Large
\n",
"
W
\n",
"
4.504
\n",
"
\n",
"
\n",
"
3
\n",
"
University of California--Berkeley
\n",
"
11.4
\n",
"
50
\n",
"
789.451
\n",
"
Large
\n",
"
W
\n",
"
3.198
\n",
"
\n",
"
\n",
"
4
\n",
"
University of Illinois--Urbana-Champaign
\n",
"
10.5
\n",
"
55
\n",
"
771.894
\n",
"
Large
\n",
"
MW
\n",
"
2.704
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" College Name Average Count Faculty GRE \\\n",
"0 Carnegie Mellon University 18.3 122 791.376 \n",
"1 Massachusetts Institute of Technology 15.0 64 771.894 \n",
"2 Stanford University 14.3 55 800.000 \n",
"3 University of California--Berkeley 11.4 50 789.451 \n",
"4 University of Illinois--Urbana-Champaign 10.5 55 771.894 \n",
"\n",
" Department Size Regional Code Pub Count \n",
"0 Large NE 2.319 \n",
"1 Large NE 2.667 \n",
"2 Large W 4.504 \n",
"3 Large W 3.198 \n",
"4 Large MW 2.704 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.read_csv(\"CSranking.csv\")\n",
"data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ADS - Algorithmic rankers\n",
"- Ranking methodology is inspired by US World & News Report and CS Rankings.\n",
"- A rule-based system (ranker). Scores are generated through a rule: f(x) = w_1 * Faculty(x) + w_2 * Average Count(x) + w_3 * GRE(x). \n",
"- Generate a ranking of CS departments based on above scores.\n",
"- Other possible rankers: [Learning to Rank methods](http://www.morganclaypool.com/doi/abs/10.2200/S00607ED2V01Y201410HLT026)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
College Name
\n",
"
Average Count
\n",
"
Faculty
\n",
"
GRE
\n",
"
Department Size
\n",
"
Regional Code
\n",
"
Pub Count
\n",
"
Score
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Carnegie Mellon University
\n",
"
18.3
\n",
"
122
\n",
"
791.376
\n",
"
Large
\n",
"
NE
\n",
"
2.319
\n",
"
931.676
\n",
"
\n",
"
\n",
"
1
\n",
"
Massachusetts Institute of Technology
\n",
"
15.0
\n",
"
64
\n",
"
771.894
\n",
"
Large
\n",
"
NE
\n",
"
2.667
\n",
"
850.894
\n",
"
\n",
"
\n",
"
2
\n",
"
Stanford University
\n",
"
14.3
\n",
"
55
\n",
"
800.000
\n",
"
Large
\n",
"
W
\n",
"
4.504
\n",
"
869.300
\n",
"
\n",
"
\n",
"
3
\n",
"
University of California--Berkeley
\n",
"
11.4
\n",
"
50
\n",
"
789.451
\n",
"
Large
\n",
"
W
\n",
"
3.198
\n",
"
850.851
\n",
"
\n",
"
\n",
"
4
\n",
"
University of Illinois--Urbana-Champaign
\n",
"
10.5
\n",
"
55
\n",
"
771.894
\n",
"
Large
\n",
"
MW
\n",
"
2.704
\n",
"
837.394
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" College Name Average Count Faculty GRE \\\n",
"0 Carnegie Mellon University 18.3 122 791.376 \n",
"1 Massachusetts Institute of Technology 15.0 64 771.894 \n",
"2 Stanford University 14.3 55 800.000 \n",
"3 University of California--Berkeley 11.4 50 789.451 \n",
"4 University of Illinois--Urbana-Champaign 10.5 55 771.894 \n",
"\n",
" Department Size Regional Code Pub Count Score \n",
"0 Large NE 2.319 931.676 \n",
"1 Large NE 2.667 850.894 \n",
"2 Large W 4.504 869.300 \n",
"3 Large W 3.198 850.851 \n",
"4 Large MW 2.704 837.394 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weights = [1, 1, 1]\n",
"chosen_atts = [\"Average Count\", \"Faculty\", \"GRE\"]\n",
"\n",
"data[\"Score\"] = sum([weights[idx]*data[atti] for idx, atti in enumerate(chosen_atts)])\n",
"data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Questions considering interpretability\n",
"1) (Data) What would be a good feature set for the decisions? \n",
"\n",
"2) (Outcomes) Is 10.5 a high score or a low score? \n",
" - The score generation rule alone does not indicate the relative rank of an item. \n",
" - It depends on how 10.5 compares to the scores of other items.\n",
" \n",
"3) (Outcomes) Is the generated scores stable? \n",
" - Unless raw scores are disclosed, the user has no information about the magnitude of the difference in scores between items that appear in consecutive ranks. \n",
" - In Table 1, CMU (18.3) has a much higher score than the immediately following MIT (15). This is in contrast to UIUC (10.5, rank 5) and UW (10.3, rank 6), which are nearly tied.\n",
" \n",
"4) (Outcomes) Is there any unfair treatment*?\n",
"\n",
"5) (Outcomes) How are departments represented in the ranking?\n",
"\n",
"6) (Rankers) How to interpret the weight? \n",
" - The weight of an attribute in the score generation rule does not determine its impact on the outcome.\n",
" - For example, given f(x) = 0.2 * Faculty(x) + 0.3 * Average Count(x) + 0.5 * GRE(x), yet for data in Table 1, attribute Faculty will be the deciding factor that sets apart top-ranked departments from those in lower ranks. \n",
" - This is because the value of Faculty changes most dramatically in the data, and because it correlates with Average Count (in effect, double-counting).\n",
" \n",
"7) (Rankers) Is the ranking methodology stable? \n",
" - For example, a score generation rule: f(x) = Pub Count(x) + GRE(x) would be unstable, because the values of these attributes are both very close for many of the items and induce different rankings.\n",
" - Prioritizing one attribute over the other slightly would cause significant re-shuffling.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Data Analysis\n",
"- Pre-process data set as need.\n",
"- Explore the data set before choose the ranker."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preprocess data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
College Name
\n",
"
Average Count
\n",
"
Faculty
\n",
"
GRE
\n",
"
Department Size
\n",
"
Regional Code
\n",
"
Pub Count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Carnegie Mellon University
\n",
"
1.000000
\n",
"
1.000000
\n",
"
0.795577
\n",
"
Large
\n",
"
NE
\n",
"
0.338280
\n",
"
\n",
"
\n",
"
1
\n",
"
Massachusetts Institute of Technology
\n",
"
0.804734
\n",
"
0.462963
\n",
"
0.333776
\n",
"
Large
\n",
"
NE
\n",
"
0.443671
\n",
"
\n",
"
\n",
"
2
\n",
"
Stanford University
\n",
"
0.763314
\n",
"
0.379630
\n",
"
1.000000
\n",
"
Large
\n",
"
W
\n",
"
1.000000
\n",
"
\n",
"
\n",
"
3
\n",
"
University of California--Berkeley
\n",
"
0.591716
\n",
"
0.333333
\n",
"
0.749947
\n",
"
Large
\n",
"
W
\n",
"
0.604482
\n",
"
\n",
"
\n",
"
4
\n",
"
University of Illinois--Urbana-Champaign
\n",
"
0.538462
\n",
"
0.379630
\n",
"
0.333776
\n",
"
Large
\n",
"
MW
\n",
"
0.454876
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" College Name Average Count Faculty \\\n",
"0 Carnegie Mellon University 1.000000 1.000000 \n",
"1 Massachusetts Institute of Technology 0.804734 0.462963 \n",
"2 Stanford University 0.763314 0.379630 \n",
"3 University of California--Berkeley 0.591716 0.333333 \n",
"4 University of Illinois--Urbana-Champaign 0.538462 0.379630 \n",
"\n",
" GRE Department Size Regional Code Pub Count \n",
"0 0.795577 Large NE 0.338280 \n",
"1 0.333776 Large NE 0.443671 \n",
"2 1.000000 Large W 1.000000 \n",
"3 0.749947 Large W 0.604482 \n",
"4 0.333776 Large MW 0.454876 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cur_data = data.iloc[:,:-1] # remove the above generated score column by original data\n",
"ignore_atts = [\"College Name\",\"Department Size\", \"Regional Code\"] # exclude the categorical data during preprocessing\n",
"norm_data = normalizeDataset(cur_data, ignore_atts)\n",
"# standarded_data = standardizeData(norm_data, ignore_atts)\n",
"norm_data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Distribution Visualization"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def visualize_att_dist(_data, _att, _category=False):\n",
" plt.figure(figsize=[6,5],dpi=100)\n",
" sns.set(style=\"darkgrid\") \n",
" sns.set(font_scale = 2)\n",
" if _category: # for categorical attribute\n",
" vis_data = _data[_att].value_counts()\n",
" ax = sns.barplot(x=vis_data.index, y=vis_data.values);\n",
" ax.set_yticks([x for x in range(max(vis_data.values)+1) if x%2==0]);\n",
" ax.set_xlabel(_att)\n",
" else:\n",
" ax = sns.distplot(_data[_att], kde=False, color='steelblue');\n",
" ax.set_ylabel(\"Count\")\n",
" plt.tight_layout()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"corr_df = data.corr(method='pearson')\n",
"print(\"--------------- CORRELATIONS ---------------\")\n",
"print(corr_df.head(len(data)))\n",
"print(\"--------------- CREATE A HEATMAP ---------------\")\n",
"# Create a mask to display only the lower triangle of the matrix \n",
"mask = np.zeros_like(corr_df)\n",
"mask[np.triu_indices_from(mask)] = True\n",
"sns.heatmap(corr_df, cmap='RdYlGn_r', vmax=1.0, vmin=-1.0 , mask = mask, linewidths=2.5)\n",
"plt.yticks(rotation=0) \n",
"plt.xticks(rotation=90) \n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Nutritional Labels\n",
"- Web application of Nutritional Labels for Rankings is live at http://demo.dataresponsibly.com/rankingfacts/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Available tools\n",
" - AI Fairness 360 for fairness validation\n",
" - Data profiling for input data analysis\n",
" - Data cleaning\n",
" - Pyplot and Seaborn for visulizations\n",
" - ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Recipe and Ingredients\n",
"- These two labels help to explain the ranking methodology. \n",
"- The Recipe label succinctly describes the ranking algorithm. It states the explicit intentions of the designer of the score generation rule about which attributes matter and to what extent.\n",
" - For example, for a linear score generation rule, each attribute would be listed together with its weight. \n",
"- The Ingredients label lists attributes most material to the ranked outcome, in order of importance. \n",
" - It may show additional attributes associated with high rank. Such associations can be derived with linear models or with other Black-box methods.\n",
" - For example, for a linear model, this list could present the attributes with the highest learned weights. \n",
"- The Recipe and Ingredients labels also list statistics of the attributes in the Recipe and in the Ingredients.\n",
" - minimum, maximum and median values at the top-10 and over-all."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def compute_statistic(_data, _atts):\n",
" \"\"\"\n",
" Compute the statistics of input attributes.\n",
"\n",
" Attributes:\n",
" _data: dataframe that stored the data\n",
" _atts: array that stores the attributes to be computed\n",
" Return: json data of computed statistics\n",
" \"\"\"\n",
" output_df = pd.DataFrame(columns=[\"attribute\", \"median\", \"mean\", \"min\", \"max\"])\n",
" for atti in _atts:\n",
" atti_stats = _data.describe().loc[[\"50%\", \"mean\", \"min\", \"max\"], atti].tolist()\n",
" output_df.loc[output_df.shape[0]] = [atti] + atti_stats\n",
" return output_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recipe for top-10 ranking"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- RANKER USE FOLLOWING ATTRIBUTES ---------------\n",
"['Average Count', 'Faculty', 'GRE']\n",
"--------------- STATISTICALS OF ATTRIBUTES FOR TOP-10 RANKING ---------------\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
attribute
\n",
"
median
\n",
"
mean
\n",
"
min
\n",
"
max
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Average Count
\n",
"
10.400
\n",
"
11.0200
\n",
"
6.800
\n",
"
18.3
\n",
"
\n",
"
\n",
"
1
\n",
"
Faculty
\n",
"
55.000
\n",
"
63.4000
\n",
"
45.000
\n",
"
122.0
\n",
"
\n",
"
\n",
"
2
\n",
"
GRE
\n",
"
796.254
\n",
"
791.4109
\n",
"
771.894
\n",
"
800.0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" attribute median mean min max\n",
"0 Average Count 10.400 11.0200 6.800 18.3\n",
"1 Faculty 55.000 63.4000 45.000 122.0\n",
"2 GRE 796.254 791.4109 771.894 800.0"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"--------------- RANKER USE FOLLOWING ATTRIBUTES ---------------\")\n",
"print(chosen_atts)\n",
"print(\"--------------- STATISTICALS OF ATTRIBUTES FOR TOP-10 RANKING ---------------\")\n",
"compute_statistic(data.head(10), chosen_atts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recipe for overall ranking"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- STATISTICALS OF ATTRIBUTES FOR OVERALL RANKING ---------------\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
attribute
\n",
"
median
\n",
"
mean
\n",
"
min
\n",
"
max
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Average Count
\n",
"
2.9
\n",
"
4.447059
\n",
"
1.400
\n",
"
18.3
\n",
"
\n",
"
\n",
"
1
\n",
"
Faculty
\n",
"
32.0
\n",
"
36.470588
\n",
"
14.000
\n",
"
122.0
\n",
"
\n",
"
\n",
"
2
\n",
"
GRE
\n",
"
790.0
\n",
"
787.264255
\n",
"
757.813
\n",
"
800.0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" attribute median mean min max\n",
"0 Average Count 2.9 4.447059 1.400 18.3\n",
"1 Faculty 32.0 36.470588 14.000 122.0\n",
"2 GRE 790.0 787.264255 757.813 800.0"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"--------------- STATISTICALS OF ATTRIBUTES FOR OVERALL RANKING ---------------\")\n",
"compute_statistic(data, chosen_atts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ingredients"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def linear_correlation(_data, _y_col=\"Score\", top_threshold=0.5, round_default=2):\n",
" \"\"\"\n",
" Compute the correlation between attributes and generated scores using linear regression.\n",
"\n",
" Attributes:\n",
" _data: dataframe that stored the data\n",
" _y_col: column name of Y variable\n",
" top_threshold: threshold of attribute coefficient\n",
" round_default: threshold of round function for the returned coefficient\n",
" Return: list of correlated attributes and its coefficients\n",
" \"\"\"\n",
" num_atts = list(_data.iloc[:,:-1].describe().columns)\n",
" X = data[num_atts].values\n",
" y = data[_y_col].values\n",
"\n",
" regr = linear_model.LinearRegression(normalize=False)\n",
" regr.fit(X, y)\n",
" for i in range(len(regr.coef_)):\n",
" regr.coef_[i] = round(regr.coef_[i], round_default)\n",
" # normalize coefficients to [-1,1]\n",
" max_coef = max(regr.coef_)\n",
" min_coef = min(regr.coef_)\n",
" abs_max = max(abs(max_coef),abs(min_coef))\n",
" norm_coef = []\n",
" for ci in regr.coef_:\n",
" new_ci = round(ci/abs_max,round_default)\n",
" norm_coef.append(new_ci)\n",
" coeff_zip = zip(norm_coef, num_atts)\n",
" return_coeff = {}\n",
" for ci, atti in coeff_zip:\n",
" if ci > top_threshold:\n",
" return_coeff[atti] = ci\n",
" return return_coeff"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- IMPORTANT ATTRIBUTES ---------------\n"
]
},
{
"data": {
"text/plain": [
"{'Average Count': 0.6, 'GRE': 1.0}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# run the ranker on the normalized data\n",
"weights = [0.3, 0.2, 0.5]\n",
"chosen_atts = [\"Average Count\", \"Faculty\", \"GRE\"]\n",
"data[\"Score\"] = sum([weights[idx]*data[atti] for idx, atti in enumerate(chosen_atts)])\n",
"print(\"--------------- IMPORTANT ATTRIBUTES ---------------\")\n",
"lg_weights = linear_correlation(data)\n",
"lg_weights"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### More details in Ingredients"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- STATISTICS OF IMPORTANT ATTRIBUTES for TOP-10 RANKING ---------------\n",
" attribute median mean min max\n",
"0 Average Count 10.400 11.0200 6.800 18.3\n",
"1 GRE 796.254 791.4109 771.894 800.0\n"
]
}
],
"source": [
"learned_atts = lg_weights.keys()\n",
"print(\"--------------- STATISTICS OF IMPORTANT ATTRIBUTES for TOP-10 RANKING ---------------\")\n",
"if len(set(learned_atts).intersection(set(chosen_atts))) != len(chosen_atts):\n",
" print (compute_statistic(data.head(10), learned_atts))\n",
"else:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- STATISTICS OF IMPORTANT ATTRIBUTES FOR OVERALL RANKING ---------------\n",
" attribute median mean min max\n",
"0 Average Count 2.9 4.447059 1.400 18.3\n",
"1 GRE 790.0 787.264255 757.813 800.0\n"
]
}
],
"source": [
"print(\"--------------- STATISTICS OF IMPORTANT ATTRIBUTES FOR OVERALL RANKING ---------------\")\n",
"if len(set(learned_atts).intersection(set(chosen_atts))) != len(chosen_atts):\n",
" print (compute_statistic(data, learned_atts))\n",
"else:\n",
" pass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stability\n",
"- The Stability label explains whether the ranking methodology is robust on this particular dataset. \n",
" - An unstable ranking is one where slight changes to the data (e.g., due to uncertainty and noise), or to the methodology (e.g., by slightly adjusting the weights in a score-based ranker) could lead to a significant change in the output. \n",
"- This Stability label reports a stability score, as a single number that indicates the extent of the change required for the ranking to change.\n",
" - The stability of the ranking is quantified as the slope of the line that is fit to the score distribution, at the top-10 and over-all.\n",
" - A score distribution is unstable if scores of items in adjacent ranks are close to each other (|slope| <= 0.25), and so a very small change in scores will lead to a change in the ranking. "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def visualize_stability(_data, _y_col=\"Score\", _top_n=100):\n",
" plt.figure(figsize=[6,5],dpi=100)\n",
" sns.set(style=\"darkgrid\") \n",
" sns.set(font_scale = 2)\n",
" vis_data = _data.head(_top_n)\n",
" vis_x = [x+1 for x in vis_data.index]\n",
" vis_y = sorted(vis_data[_y_col], reverse=True)\n",
" ax = sns.scatterplot(vis_x, vis_y, color=\"steelblue\");\n",
" ax.set_xlabel(\"Position\")\n",
" ax.set_ylabel(_y_col)\n",
" plt.tight_layout()\n",
"def compute_slope_scores(_data, _y_col=\"Score\", round_default=2):\n",
" \"\"\"\n",
" Compute the slop of a list of scores.\n",
"\n",
" Attributes:\n",
" _data: file name that stored the data\n",
" _y_col: column name of Y variable\n",
" round_default: threshold of round function for the returned stability\n",
" Return: slope of scores in the input _data\n",
" \"\"\"\n",
" xd = [i for i in range(1,len(_data)+1)]\n",
" yd = _data[_y_col].values\n",
" par = np.polyfit(xd, yd, 1, full=True)\n",
" slope = par[0][0]\n",
" return abs(round(slope, round_default))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- UNSTABLE AT TOP 10 (STABILITY AS 0.18) ---------------\n",
"--------------- STABLE OVERALL (STABILITY AS 0.35) ---------------\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"top10_stability = compute_slope_scores(data.head(10))\n",
"if top10_stability <= 0.25:\n",
" print(\"--------------- UNSTABLE AT TOP 10 (STABILITY AS \"+str(top10_stability)+\") ---------------\") \n",
"else:\n",
" print(\"--------------- STABLE AT TOP 10 (STABILITY AS \"+str(top10_stability)+\") ---------------\") \n",
"all_stability = compute_slope_scores(data)\n",
"if all_stability <= 0.25:\n",
" print(\"--------------- UNSTABLE OVERALL (STABILITY AS \"+str(all_stability)+\") ---------------\") \n",
"else:\n",
" print(\"--------------- STABLE OVERALL (STABILITY AS \"+str(all_stability)+\") ---------------\") \n",
"visualize_stability(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fairness\n",
"- The Fairness label quantifies whether the ranked output exhibits statistical parity with respect to one or more sensitive attributes, such as gender or race of individuals. \n",
"- All these measures are statistical tests, and whether a result is fair is determined by the computed p-value.\n",
"- NULL hypothesis is that ranking process is fair for protected group.\n",
"- The results fo three measures of statistical parity are shown: [FA*IR](https://dl.acm.org/citation.cfm?doid=3132847.3132938), [proportion](https://doi.org/10.1007/s10618-017-0506-1), and [pairwise comparison](https://arxiv.org/pdf/1804.07890.pdf).\n",
"- A ranking is considered unfair when the p-value of the corresponding statistical test falls below 0.05 or based on the adjusted alpha for FA*IR."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FA* IR measure"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"def compute_p_FAIR(_data, _att, _protected_group, _y_col=\"Score\", top_k = 100, round_default = 2):\n",
" \"\"\"\n",
" Compute p-value using FA*IR algorithm\n",
"\n",
" Attributes:\n",
" _data: dataframe that stored the data\n",
" _att: sensitive attribute name\n",
" _protected_group: the value of sensitive attribute for protected group\n",
" _y_col: the column that stores the values of ranking\n",
" top_k: the top ranking to verify group fairness\n",
" round_default: threshold of round function\n",
" Return: rounded p-value and adjusted significance level in FA*IR\n",
" \"\"\"\n",
" _data.sort_values(by=_y_col, ascending=False, inplace=True)\n",
" _data.reset_index(drop=True, inplace=True)\n",
" if len(_data)/2 < top_k:\n",
" top_k = int(len(_data)/2)\n",
" pos_protected = _data[_data[_att]==_protected_group].index+1\n",
" pro_prob = len(pos_protected)/len(_data)\n",
"\n",
" # transform ranking to a ranking of tuples with (id,\"pro\")/(id,\"unpro\") to run FA*IR\n",
" transformed_ranking = []\n",
" for index, row in _data.head(top_k).iterrows():\n",
" if row[_att] == _protected_group:\n",
" transformed_ranking.append([index,\"pro\"])\n",
" else:\n",
" transformed_ranking.append([index,\"unpro\"])\n",
"\n",
" p_value, isFair, posiFail, alpha_c, pro_needed_list = computeFairRankingProbability(top_k, pro_prob, transformed_ranking)\n",
" return p_value, isFair, posiFail, round(alpha_c,round_default)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- GROUP FAIRNESS VERIFICATION CONSIDERING Department Size ---------------\n",
"--------------- FAIR FOR Large (p=0.9860885630366442, alpha=0.87) ---------------\n",
"--------------- UNFAIR FOR Small (p=0.038976119236976886, alpha=0.65) FAIL AT RANK POSITION 5 ---------\n"
]
}
],
"source": [
"sensi_att = \"Department Size\"\n",
"protected_groups = data[sensi_att].unique()\n",
"print(\"--------------- GROUP FAIRNESS VERIFICATION CONSIDERING \"+sensi_att+\" ---------------\") \n",
"for vi in protected_groups:\n",
" vi_p, fair_res, pos_fail, vi_alpha = compute_p_FAIR(data, sensi_att, vi)\n",
" if fair_res:\n",
" print(\"--------------- FAIR FOR \"+vi+\" (p=\"+str(vi_p)+\", alpha=\"+str(vi_alpha)+\") ---------------\") \n",
" else:\n",
" print(\"--------------- UNFAIR FOR \"+vi+\" (p=\"+str(vi_p)+\", alpha=\"+str(vi_alpha)+\") FAIL AT RANK POSITION \"+str(pos_fail)+\" ---------\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pairwise comparison"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"def compute_p_pairs(_data, _att, _protected_group, _y_col=\"Score\", run_time = 100, round_default = 2):\n",
" \"\"\"\n",
" Compute p-value using Pairwise oracle\n",
"\n",
" Attributes:\n",
" _data: dataframe that stored the data\n",
" _att: sensitive attribute name\n",
" _protected_group: the value of sensitive attribute for protected group\n",
" _y_col: the column that stores the values of ranking\n",
" run_time: simulation times for pairwise comparison\n",
" round_default: threshold of round function\n",
" Return: rounded p-value\n",
" \"\"\"\n",
" _data.sort_values(by=_y_col, ascending=False, inplace=True)\n",
" _data.reset_index(drop=True, inplace=True)\n",
" pos_protected = _data[_data[_att]==_protected_group].index+1\n",
" pro_prob = len(pos_protected)/len(_data)\n",
" total_n = len(_data)\n",
" pro_n = len(pos_protected)\n",
" seed_random_ranking = [x for x in range(total_n)] # list of IDs\n",
" seed_f_index = [x for x in range(pro_n)] # list of IDs\n",
" \n",
" sim_df = pd.DataFrame(columns=[\"Run\", \"pair_n\"])\n",
" # run the simulation of ranking generation, in each simulation, generate a fair ranking with input N and size of sensitive group\n",
" for ri in range(run_time):\n",
" output_ranking = mergeUnfairRanking(seed_random_ranking, seed_f_index, pro_prob)\n",
" position_pro_list = [i for i in range(total_n) if output_ranking[i] in seed_f_index]\n",
" count_sensi_prefered_pairs = 0\n",
" for i in range(len(position_pro_list)):\n",
" cur_position = position_pro_list[i]\n",
" left_sensi = pro_n - (i + 1)\n",
" count_sensi_prefered_pairs = count_sensi_prefered_pairs + (total_n - cur_position - left_sensi)\n",
" cur_row = [ri + 1, count_sensi_prefered_pairs]\n",
" sim_df.loc[sim_df.shape[0]] = cur_row\n",
"\n",
" input_pair_n, _, _ = computePairN(_att, _protected_group, _data)\n",
" pair_samples = list(sim_df[\"pair_n\"].dropna())\n",
" return round(Cdf(pair_samples, input_pair_n), round_default)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- GROUP FAIRNESS VERIFICATION CONSIDERING Department Size ---------------\n",
"--------------- FAIR FOR Large (p=0.96, alpha=0.05) ---------------\n",
"--------------- UNFAIR FOR Small (p=0.04, alpha=0.05) --------------\n"
]
}
],
"source": [
"sensi_att = \"Department Size\"\n",
"protected_groups = data[sensi_att].unique()\n",
"print(\"--------------- GROUP FAIRNESS VERIFICATION CONSIDERING \"+sensi_att+\" ---------------\") \n",
"for vi in protected_groups:\n",
" vi_p = compute_p_pairs(data, sensi_att, vi)\n",
" if vi_p > 0.05:\n",
" print(\"--------------- FAIR FOR \"+vi+\" (p=\"+str(vi_p)+\", alpha=0.05) ---------------\") \n",
" else:\n",
" print(\"--------------- UNFAIR FOR \"+vi+\" (p=\"+str(vi_p)+\", alpha=0.05) --------------\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Proportion"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"def compute_p_proportion(_data, _att, _protected_group, _y_col=\"Score\", top_k = 100, round_default = 2):\n",
" \"\"\"\n",
" Compute p-value using Proportion oracle, i.e., z-test method of 4.1.3 in \"A survey on measuring indirect discrimination in machine learning\".\n",
"\n",
" Attributes:\n",
" _data: dataframe that stored the data\n",
" _att: sensitive attribute name\n",
" _protected_group: the value of sensitive attribute for protected group\n",
" _y_col: the column that stores the values of ranking\n",
" top_k: the top ranking to verify group fairness\n",
" round_default: threshold of round function\n",
" Return: rounded p-value\n",
" \"\"\"\n",
" \n",
" _data.sort_values(by=_y_col, ascending=False, inplace=True)\n",
" _data.reset_index(drop=True, inplace=True)\n",
" if len(_data)/2 < top_k:\n",
" top_k = int(len(_data)/2)\n",
" total_n = len(_data)\n",
" pro_n = len(_data[_data[_att]==_protected_group])\n",
" unpro_n = total_n - pro_n\n",
" \n",
" top_data = _data.head(top_k)\n",
" pro_k = len(top_data[top_data[_att]==_protected_group])\n",
" unpro_k = top_k - pro_k\n",
" \n",
"\n",
" pooledSE = math.sqrt((pro_k / pro_n * (1-pro_k/pro_n) / pro_n) + (unpro_k/unpro_n * (1-unpro_k/unpro_n) / unpro_n))\n",
" \n",
" z_test = (unpro_k/unpro_n - pro_k/pro_n) / pooledSE\n",
" p_value = norm.sf(z_test)\n",
"\n",
" return round(p_value,round_default)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------- GROUP FAIRNESS VERIFICATION CONSIDERING Department Size ---------------\n",
"--------------- FAIR FOR Large (p=1.0, alpha=0.05) ---------------\n",
"--------------- UNFAIR FOR Small (p=0.0, alpha=0.05) --------------\n"
]
}
],
"source": [
"sensi_att = \"Department Size\"\n",
"protected_groups = data[sensi_att].unique()\n",
"print(\"--------------- GROUP FAIRNESS VERIFICATION CONSIDERING \"+sensi_att+\" ---------------\") \n",
"for vi in protected_groups:\n",
" vi_p = compute_p_proportion(data, sensi_att, vi)\n",
" if vi_p > 0.05:\n",
" print(\"--------------- FAIR FOR \"+vi+\" (p=\"+str(vi_p)+\", alpha=0.05) ---------------\") \n",
" else:\n",
" print(\"--------------- UNFAIR FOR \"+vi+\" (p=\"+str(vi_p)+\", alpha=0.05) --------------\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diversity\n",
"- The Diversity label shows diversity with respect to a set of demographic categories of individuals, or a set of categorical attributes of other kinds of items. \n",
"- This label displays the proportion of each category in the top-10 ranked list and over-all.\n",
"- Like other labels, it is updated as the user selects different ranking methods or sets different weights. "
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"def visualize_diversity(_data, _att, _y_col=\"Score\"):\n",
" plt.figure(figsize=[6,5],dpi=100)\n",
" sns.set(font_scale = 2)\n",
" sns.set_palette(palette=\"pastel\")\n",
" sort_data = _data.sort_values(by=_y_col, ascending=False)\n",
" vis_count = sort_data[_att].value_counts()\n",
" plt.pie(list(vis_count.values), labels=list(vis_count.index));\n",
" plt.title(_att)\n",
" plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Diversity for top-10 ranking"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"chosen_atts = [\"Regional Code\", \"Department Size\"]\n",
"for atti in chosen_atts:\n",
" visualize_diversity(data.head(10), atti)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Diversity for overall ranking"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"for atti in chosen_atts:\n",
" visualize_diversity(data, atti)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}