Demos#

The LIT team maintains a number of hosted demos, as well as pre-built launchers for some common tasks and model types.

For publicly-visible demos hosted on Google Cloud, see https://pair-code.github.io/lit/demos/.


Classification #

Sentiment and NLI #

Hosted instance: https://pair-code.github.io/lit/demos/glue.html
Code: examples/glue_demo.py

  • Multi-task demo:

    • Sentiment analysis as a binary classification task (SST-2) on single sentences.

    • Natural Language Inference (NLI) using MultiNLI, as a three-way classification task with two-segment input (premise, hypothesis).

    • STS-B textual similarity task (see Regression / Scoring below).

    • Switch tasks using the Settings (āš™ļø) menu.

  • BERT models of different sizes, built on HuggingFace TF2 (Keras).

  • Supports the widest range of LIT interpretability features:

    • Model output probabilities, custom thresholds, and multiclass metrics.

    • Jitter plot of output scores, to find confident examples or ones near the margin.

    • Embedding projector to find clusters in representation space.

    • Integrated Gradients, LIME, and other salience methods.

    • Attention visualization.

    • Counterfactual generators, including HotFlip for targeted adversarial perturbations.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/sentiment

Multilingual (XNLI) #

Code: examples/xnli_demo.py

  • XNLI dataset translates a subset of MultiNLI into 14 different languages.

  • Specify --languages=en,jp,hi,... flag to select which languages to load.

  • NLI as a three-way classification task with two-segment input (premise, hypothesis).

  • Fine-tuned multilingual BERT model.

  • Salience methods work with non-whitespace-delimited text, by using the modelā€™s wordpiece tokenization.


Regression / Scoring #

Textual Similarity (STS-B) #

Hosted instance: https://pair-code.github.io/lit/demos/glue.html?models=stsb&dataset=stsb_dev
Code: examples/glue_demo.py

  • STS-B textual similarity task, predicting scores on a range from 0 (unrelated) to 5 (very similar).

  • BERT models built on HuggingFace TF2 (Keras).

  • Supports a wide range of LIT interpretability features:

    • Model output scores and metrics.

    • Scatter plot of scores and error, and jitter plot of true labels for quick filtering.

    • Embedding projector to find clusters in representation space.

    • Integrated Gradients, LIME, and other salience methods.

    • Attention visualization.


Sequence-to-Sequence #

Gemma #

Code: examples/lm_salience_demo.py

  • Supports Gemma 2B and 7B models using KerasNLP and TensorFlow.

  • Interactively debug LLM prompts using sequence salience.

  • Multiple salience methods (grad-l2 and grad-dot-input), at multiple granularities: token-, word-, sentence-, and paragraph-level.

Tip: check out the in-depth walkthrough at https://ai.google.dev/responsible/model_behavior, part of the Responsible Generative AI Toolkit.

T5 #

Hosted instance: https://pair-code.github.io/lit/demos/t5.html
Code: examples/t5_demo.py

  • Supports HuggingFace TF2 (Keras) models as well as TensorFlow SavedModel formats.

  • Visualize beam candidates and highlight diffs against references.

  • Visualize per-token decoder hypotheses to see where the model veers away from desired output.

  • Filter examples by ROUGE score against reference.

  • Embeddings from last layer of model, visualized with UMAP or PCA.

  • Task wrappers to handle pre- and post-processing for summarization and machine translation tasks.

  • Pre-loaded eval sets for CNNDM and WMT.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/generation


Language Modeling #

BERT and GPT-2 #

Hosted instance: https://pair-code.github.io/lit/demos/lm.html
Code: examples/lm_demo.py

  • Compare multiple BERT and GPT-2 models side-by-side on a variety of plain-text corpora.

  • LM visualization supports different modes:

    • BERT masked language model: click-to-mask, and query model at that position.

    • GPT-2 shows left-to-right hypotheses for each target token.

  • Embedding projector to show latent space of the model.


Structured Prediction #

Gender Bias in Coreference #

Hosted instance: https://pair-code.github.io/lit/demos/coref.html
Code: examples/coref/coref_demo.py

  • Gold-mention coreference model, trained on OntoNotes.

  • Evaluate on the Winogender schemas (Rudinger et al. 2018) which test for gendered associations with profession names.

  • Visualizations of coreference edges, as well as binary classification between two candidate referents.

  • Stratified metrics for quantifying model bias as a function of pronoun gender or Bureau of Labor Statistics profession data.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/coref


Multimodal #

Tabular Data: Penguin Classification #

Hosted instance: https://pair-code.github.io/lit/demos/penguins.html
Code: examples/penguin_demo.py

  • Binary classification on penguin dataset.

  • Showing using of LIT on non-text data (numeric and categorical features).

  • Use partial-dependence plots to understand feature importance on individual examples, selections, or the entire evaluation dataset.

  • Use binary classifier threshold setters to find best thresholds for slices of examples to achieve specific fairness constraints, such as demographic parity.

Image Classification with MobileNet #

Hosted instance: https://pair-code.github.io/lit/demos/images.html
Code: examples/image_demo.py

  • Classification on ImageNet labels using a MobileNet model.

  • Showing using of LIT on image data.

  • Explore results of multiple gradient-based image saliency techniques in the Salience Maps module.