Issue of fairness in Machine Learning, The What-If Tool.
Google’s open source tool for analysing machine learning models without coding.
INTRODUCTION
There is no standard definition of fairness, whether decisions are made by humans or by machines. Far from a solved problem, fairness in AI presents both an opportunity and a challenge. We are all aware of the fact that the ML models have an issue about bias because it works off of data that very frequently reflects societal biases and if we’re not super careful in considering those facts, the results of the model can itself be heavily biased which is not acceptable because we’re all committed to living in a fair society.
What is fairness?
The term fairness is very subjective and it can mean a lot of things to each different individual.
To better understand this let’s take this very famous scenario:
Suppose there’s mortgage lender company using an ML model to scan through loan requests and pile them up into two categories, “Rejected” and “Accepted”. Out of the total applications only 30% come from women. To figure out what gender mix of Rejected and Approved requests would be fair, the company appoints different groups of fairness activists, and customer advocates.
After discussing among themselves for hours, all the different groups come onto different conclusion.
Group #1 (Group unaware) says, according to them, fairness will be achieved when we totally disregard the gender labels of the applicants who are given loans. In this case, if it happens that only men or only women are eligible for the loan or our results contain only one gender being eligible for the loan, even then the results are fair because the applicants were chosen purely on the basis of merits completely disregarding their genders. Otherwise, the company will have to kick out someone more eligible for a loan than someone less eligible because of their gender. This type of fairness is sometimes called “group unaware” fairness.
Group #2 (Group thresholds) strongly disagrees with what group #1 said. They say, the data used to train our model, being very realistic, reflects a lot of historical biases and social constructs against women which in turn makes men more loan worthy than woman. For instance, some women’s work histories may have a gap (interruption) in them because they were on a maternity leave. This interruption, in our machine learning model, will definitely be counted against women. To tackle this problem, experts suggested, the model’s threshold should be varied for men and women. If the loan is granted to a man when he crosses a confidence level of 0.6, the same loan, for instance, should be granted to a woman when she crosses a confidence level of 0.4.
Group #3 (Demographic parity) says no. They further add that look at the composition of total applicants (in this case 70% male, 30% female).
The composition of total set of people eligible for loan should be the same as composition of total applicants i.e., total approved candidates should be 70% men and 30% women.
This debate goes on and on for hours and there is no one conclusion. Each group of experts are arguing and presenting scenarios where their model of fairness fits the best.
These three are far from the only varieties or definitions of fairness.
From the above discussion, one can only conclude one thing that fairness is complex.
ROLE OF WHAT-IF TOOL
Though the above question of fairness cannot be answered by anyone as there is no particular answer to it, but Google’s What-If Tool (WIT abbreviated) helps us to visualise each of these and many more fairness techniques by just the click of a few buttons and without the need to explicitly code everything.
It helps in analysing the “What-If” questions on our model.
Released by Google under the PAIR(People + AI Research) initiative, it is an open-source visualisation tool. It is designed to examine machine learning models in an interactively visual way. It can be used on both classification and regression models and it enables users to compare, evaluate and examine their models. Everyone can use the WIT, be it product manager, a developer, or even a research scholar and all this because of its user-friendly interface and less to none dependency on complex coding.
ADVANTAGES
The WIT enables people to play with a trained ML model. It’s simple, powerful and its visual user interface gives it a lot of advantages.
OVERVIEW OF THE TOOL
As you can see, the UI is divided into 2 parts, left and right panels.
- Right panel is the area where you can visualize your data on a graph, select data points and see how your ML model predicts for each data point.
- Left panel is the area where you can edit each data point, modify your data, play with fairness, etc. It basically controls what gets presented on the right side of the screen.
Above the left panel you can see 3 different tabs:
- Datapoint Editor Tab,
- Performance & Fairness Tab, and
- Features Tab
We’ll see what each of these tab does in brief.
1. Datapoint Editor Tab
As the name suggests, the Datapoint editor tab allows you to edit your datapoint and see what change does that edit bring in overall model.
In the example above you can see that just by changing the age of a person from 51 to 58, our whole ML model adapts according to the change. Also, you can note that the confidence interval of this person is also affected by changing just one feature.
Other features that this datapoint editor tab provides us are Finding Nearest Counterfactuals and Analysing partial dependence plots
2. Performance & Fairness Tab
This tab allows you to look at the ROC curves and also the confusion matrix (for classification model) or different evaluation metrics (for regression model). In a gist, it helps to analyse the model’s performance.
In the above image we can see that we have selected a ground truth value (feature to be predicted) and we can set different threshold for classification and also compare different ROC curves and area under them. We can also analyse the model’s fairness and apply some pre-set fairness definitions (highlighted via green box in the above image) on our model. It also provides a feature to slice our dataset by features and compare performance across those slices.
3. Features Tab
The features tab shows bar charts, histograms, quantile charts, etc of each of our features and also gives a summary statistic for the same. The tab also enables us to look into the distribution of values, minimum/maximum value for each feature in the dataset.
CONCLUSION
Just creating and deploying a model is not machine learning but understanding how and why the model was created is the true sense of machine learning. The problem of fairness in Machine Learning goes far and beyond than what I have covered in this article and it is a continuously developing field. This is a very vast social topic and there is a lot of research work which is to be done in this field. The What-If Tool by Google is a pretty handy tool and it allows us to play with our model be it in terms of datapoints or fairness of the algorithm. The use case and functionalities of the WIT are much more than what I was able to cover in this article and for further reading you can go through the references.