What If Tool
What If...
you could inspect a machine learning model,
with minimal coding required?
Building effective machine learning models means asking a lot of questions. Look for answers using the What-if Tool, an interactive visual interface designed to probe your models better.
Compatible with TensorBoard, Jupyter and Colaboratory notebooks. Works on Tensorflow and Python-accessible models.


We believe in making it easier for a broad set of people to examine, evaluate, and compare machine learning models - whether you're a developer, a product manager, a researcher or a student. Now, everyone in your team can participate in probing, shaping and improving models.

The What-If Tool makes it easy to efficiently and intuitively explore up to two models' performance on a dataset. Investigate model performances for a range of features in your dataset, optimization strategies and even manipulations to individual datapoint values. All this and more, in a visual way that requires minimal code.
Check out this walkthrough to find out what you can learn about your models with the What-If Tool.

For more usage details, refer to the documentation.

For a system design overview and case studies of real-world usage, read our peer-reviewed paper.
Join the What-If Tool community on our Google Group.
What can you do with the What-If Tool?
Compare multiple models within the same workflow
Compare and experiment on the performance of two different models on the same dataset simultaneously. Identify individual datapoints or entire dataset slices on which the models differ the most.
Visualize inference results
A functional integration with Facets Dive helps you create custom visualizations using features in your dataset. Compare the performance of two models, or inspect a single model's performance by organizing inference results into confusion matrices, scatterplots or histograms.
Arrange datapoints by similarity
Create distance features from any datapoint and apply it to your visualizations for further analysis.
Edit a datapoint and see how your model performs
Edit, add or remove features or feature values for any selected datapoint and then run inference to test model performance. Alternatively, you can duplicate or upload a whole new example to see where it stands vis-à-vis loaded examples.
Compare counterfactuals to datapoints
For any selected datapoint, find the most similar datapoint of a different classification[1]Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR.. Compare them side-by-side to see what makes the similar and what makes them different.
Use feature values as lenses into model performance
Auto-generated partial dependence plots for individual features show changes in inference results across their different valid values.
Experiment using confusion matrices and ROC curves
Experiment with thresholds and cost ratios in binary models interactively. Observe results as ROC curves and numeric confusion matrices.
Test algorithmic fairness constraints
Examine the effect of preset algorithmic fairness constraints, such as equality of opportunity[2]Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. In Advances in neural information processing systems (pp. 3315-3323).for binary classifiers. Go deeper in your analysis by slicing your dataset into meaningful subgroups.

Read more about using the What-If Tool to investigate AI Fairness, written by David Weinberger.
Take the What-If Tool for a spin!

Income Classification

Compare two binary classification models that predict whether a person earns more than $50k a year, based on their census information[3]Fisher, R. A. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.. Examine how different features affect each models' prediction, in relation to each other.

Age Prediction

Explore the performance of a regression model which predicts a person's age from their census information[3]Fisher, R. A. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science..Slice your dataset to evaluate performance metrics such as aggregated inference error measures for each subgroup.

Smile Detection

Predict whether an image contains a smiling face using this binary classification model[5]Liu, Z, Luo, P, Wang, X, Tang, X. Deep Learning Face Attributes in the Wild. Proceedings of International Conference on Computer Vision (ICCV) 2015. on the CelebA dataset. Can you identify which group was missing from the training data, resulting in a biased model?

Flower Species Classification

This multi-class classification model predicts the species of iris flowers from sepal and petal measurements[4]Lichman, M. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.. Look for correlations between different features and flower types.

COMPAS Recidivism Classifier

Inspired by Propublica, investigate fairness using this linear classifier that mimics the behavior of the COMPAS recidivism classifier. Trained on the COMPAS dataset, this model determines if a person belongs in the "Low" risk (negative) or "Medium or High" risk (positive) class for recidivism according to COMPAS.

Text Toxicity

Use the What-If Tool to compare two pre-trained models from ConversationAI that determine sentence toxicity, one of which was trained on a more balanced dataset. Examine their performance side-by-side on the Wikipedia Comments dataset. These are keras models which do not use TensorFlow examples as an input format.

Mortgage Classification with AI Platform

Explore a mortgage classification model that has been deployed on Cloud AI Platform. This model was created with the XGBoost platform and not TensorFlow.

Training a Mortgage Classification Model with AI Platform

Train a mortgage classification model with XGBoost, deploy it to Cloud AI Platform, and use the What-If Tool to analyze it. This demo requires a Google Cloud Platform account.

Training and Comparing Wine Quality Models with AI Platform

Train both a scikit-learn and keras model to predict wine quality and deploy them to Cloud AI Platform. Then use the What-If Tool to compare the two models. This demo requires a Google Cloud Platform account.

Income Classification

Compare two binary classification models that predict whether a person earns more than $50k a year, based on their census information[3]Fisher, R. A. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.. Examine how different features affect each models' prediction, in relation to each other.

Age Prediction

Explore the performance of a regression model which predicts a person's age from their census information[3]Fisher, R. A. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science..Slice your dataset to evaluate performance metrics such as aggregated inference error measures for each subgroup.

Smile Detection

Predict whether an image contains a smiling face using this binary classification model[5]Liu, Z, Luo, P, Wang, X, Tang, X. Deep Learning Face Attributes in the Wild. Proceedings of International Conference on Computer Vision (ICCV) 2015. on the CelebA dataset. Can you identify which group was missing from the training data, resulting in a biased model?

Flower Species Classification

This multi-class classification model predicts the species of iris flowers from sepal and petal measurements[4]Lichman, M. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.. Look for correlations between different features and flower types.

Income Classification

Compare two binary classification models that predict whether a person earns more than $50k a year, based on their census information[3]Fisher, R. A. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.. Examine how different features affect each models' prediction, in relation to each other.

Age Prediction

Explore the performance of a regression model which predicts a person's age from their census information[3]Fisher, R. A. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science..Slice your dataset to evaluate performance metrics such as aggregated inference error measures for each subgroup.

Smile Detection

Predict whether an image contains a smiling face using this binary classification model[5]Liu, Z, Luo, P, Wang, X, Tang, X. Deep Learning Face Attributes in the Wild. Proceedings of International Conference on Computer Vision (ICCV) 2015. on the CelebA dataset. Can you identify which group was missing from the training data, resulting in a biased model?

COMPAS Recidivism Classifier

Inspired by Propublica, investigate fairness using this linear classifier that mimics the behavior of the COMPAS recidivism classifier. Trained on the COMPAS dataset, this model determines if a person belongs in the "Low" risk (negative) or "Medium or High" risk (positive) class for recidivism according to COMPAS.

Text Toxicity

Use the What-If Tool to compare two pre-trained models from ConversationAI that determine sentence toxicity, one of which was trained on a more balanced dataset. Examine their performance side-by-side on the Wikipedia Comments dataset. These are keras models which do not use TensorFlow examples as an input format.

Mortgage Classification with AI Platform

Explore a mortgage classification model that has been deployed on Cloud AI Platform. This model was created with the XGBoost platform and not TensorFlow.

Training a Mortgage Classification Model with AI Platform

Train a mortgage classification model with XGBoost, deploy it to Cloud AI Platform, and use the What-If Tool to analyze it. This demo requires a Google Cloud Platform account.

Training and Comparing Wine Quality Models with AI Platform

Train both a scikit-learn and keras model to predict wine quality and deploy them to Cloud AI Platform. Then use the What-If Tool to compare the two models. This demo requires a Google Cloud Platform account.

About

PAIR
The People + AI Research initiative (PAIR) brings together researchers across Google to study and redesign the ways people interact with AI systems. We focus on the "human side" of AI: the relationship between users and technology, the new applications it enables, and how to make it broadly inclusive. Our goal isn't just to publish research; we're also releasing open source tools for researchers and other experts to use.

Acknowledgements
The What-If Tool was a collaborative effort. We would like to thank the Google teams that piloted the tool and provided valuable feedback and the TensorBoard team for all their help.

References
[1] Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR.https://arxiv.org/abs/1711.00399

[2]Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. In Advances in neural information processing systems (pp. 3315-3323). https://arxiv.org/abs/1610.02413

[3] Fisher, R. A. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml/datasets/Census+Income

[4] Lichman, M. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/iris.

[5] Liu, Z, Luo, P, Wang, X, Tang, X. Deep Learning Face Attributes in the Wild. Proceedings of International Conference on Computer Vision (ICCV) 2015. http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html