Demos#
The LIT team maintains a number of hosted demos, as well as pre-built launchers for some common tasks and model types.
For publicly-visible demos hosted on Google Cloud, see https://pair-code.github.io/lit/demos/.
Classification #
Sentiment and NLI #
Hosted instance: https://pair-code.github.io/lit/demos/glue.html
Code: examples/glue_demo.py
Multi-task demo:
Sentiment analysis as a binary classification task (SST-2) on single sentences.
Natural Language Inference (NLI) using MultiNLI, as a three-way classification task with two-segment input (premise, hypothesis).
STS-B textual similarity task (see Regression / Scoring below).
Switch tasks using the Settings (āļø) menu.
BERT models of different sizes, built on HuggingFace TF2 (Keras).
Supports the widest range of LIT interpretability features:
Model output probabilities, custom thresholds, and multiclass metrics.
Jitter plot of output scores, to find confident examples or ones near the margin.
Embedding projector to find clusters in representation space.
Integrated Gradients, LIME, and other salience methods.
Attention visualization.
Counterfactual generators, including HotFlip for targeted adversarial perturbations.
Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/sentiment
Multilingual (XNLI) #
Code: examples/xnli_demo.py
XNLI dataset translates a subset of MultiNLI into 14 different languages.
Specify
--languages=en,jp,hi,...
flag to select which languages to load.NLI as a three-way classification task with two-segment input (premise, hypothesis).
Fine-tuned multilingual BERT model.
Salience methods work with non-whitespace-delimited text, by using the modelās wordpiece tokenization.
Regression / Scoring #
Textual Similarity (STS-B) #
Hosted instance: https://pair-code.github.io/lit/demos/glue.html?models=stsb&dataset=stsb_dev
Code: examples/glue_demo.py
STS-B textual similarity task, predicting scores on a range from 0 (unrelated) to 5 (very similar).
BERT models built on HuggingFace TF2 (Keras).
Supports a wide range of LIT interpretability features:
Model output scores and metrics.
Scatter plot of scores and error, and jitter plot of true labels for quick filtering.
Embedding projector to find clusters in representation space.
Integrated Gradients, LIME, and other salience methods.
Attention visualization.
Sequence-to-Sequence #
Gemma #
Code: examples/lm_salience_demo.py
Supports Gemma 2B and 7B models using KerasNLP and TensorFlow.
Interactively debug LLM prompts using sequence salience.
Multiple salience methods (grad-l2 and grad-dot-input), at multiple granularities: token-, word-, sentence-, and paragraph-level.
Tip: check out the in-depth walkthrough at https://ai.google.dev/responsible/model_behavior, part of the Responsible Generative AI Toolkit.
T5 #
Hosted instance: https://pair-code.github.io/lit/demos/t5.html
Code: examples/t5_demo.py
Supports HuggingFace TF2 (Keras) models as well as TensorFlow SavedModel formats.
Visualize beam candidates and highlight diffs against references.
Visualize per-token decoder hypotheses to see where the model veers away from desired output.
Filter examples by ROUGE score against reference.
Embeddings from last layer of model, visualized with UMAP or PCA.
Task wrappers to handle pre- and post-processing for summarization and machine translation tasks.
Pre-loaded eval sets for CNNDM and WMT.
Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/generation
Language Modeling #
BERT and GPT-2 #
Hosted instance: https://pair-code.github.io/lit/demos/lm.html
Code: examples/lm_demo.py
Compare multiple BERT and GPT-2 models side-by-side on a variety of plain-text corpora.
LM visualization supports different modes:
BERT masked language model: click-to-mask, and query model at that position.
GPT-2 shows left-to-right hypotheses for each target token.
Embedding projector to show latent space of the model.
Structured Prediction #
Gender Bias in Coreference #
Hosted instance: https://pair-code.github.io/lit/demos/coref.html
Code: examples/coref/coref_demo.py
Gold-mention coreference model, trained on OntoNotes.
Evaluate on the Winogender schemas (Rudinger et al. 2018) which test for gendered associations with profession names.
Visualizations of coreference edges, as well as binary classification between two candidate referents.
Stratified metrics for quantifying model bias as a function of pronoun gender or Bureau of Labor Statistics profession data.
Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/coref
Multimodal #
Tabular Data: Penguin Classification #
Hosted instance: https://pair-code.github.io/lit/demos/penguins.html
Code: examples/penguin_demo.py
Binary classification on penguin dataset.
Showing using of LIT on non-text data (numeric and categorical features).
Use partial-dependence plots to understand feature importance on individual examples, selections, or the entire evaluation dataset.
Use binary classifier threshold setters to find best thresholds for slices of examples to achieve specific fairness constraints, such as demographic parity.
Image Classification with MobileNet #
Hosted instance: https://pair-code.github.io/lit/demos/images.html
Code: examples/image_demo.py
Classification on ImageNet labels using a MobileNet model.
Showing using of LIT on image data.
Explore results of multiple gradient-based image saliency techniques in the Salience Maps module.