Demos¶

The LIT team maintains a number of hosted demos, as well as pre-built launchers for some common tasks and model types.

For publicly-visible demos hosted on Google Cloud, see https://pair-code.github.io/lit/demos/.

Classification ¶

Hosted instance: https://pair-code.github.io/lit/demos/glue.html
Code: examples/glue/demo.py

Multi-task demo:
- Sentiment analysis as a binary classification task (SST-2) on single sentences.
- Natural Language Inference (NLI) using MultiNLI, as a three-way classification task with two-segment input (premise, hypothesis).
- STS-B textual similarity task (see Regression / Scoring below).
- Switch tasks using the Settings (⚙️) menu.
BERT models of different sizes, built on HuggingFace TF2 (Keras).
Supports the widest range of LIT interpretability features:
- Model output probabilities, custom thresholds, and multiclass metrics.
- Jitter plot of output scores, to find confident examples or ones near the margin.
- Embedding projector to find clusters in representation space.
- Integrated Gradients, LIME, and other salience methods.
- Counterfactual generators, including HotFlip for targeted adversarial perturbations.

Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/sentiment

Hosted instance: https://pair-code.github.io/lit/demos/glue.html?models=stsb&dataset=stsb_dev
Code: examples/glue/demo.py

STS-B textual similarity task, predicting scores on a range from 0 (unrelated) to 5 (very similar).
BERT models built on HuggingFace TF2 (Keras).
Supports a wide range of LIT interpretability features:
- Model output scores and metrics.
- Scatter plot of scores and error, and jitter plot of true labels for quick filtering.
- Embedding projector to find clusters in representation space.
- Integrated Gradients, LIME, and other salience methods.

Code: examples/prompt_debugging/server.py

Supports Gemma 2B and 7B models using KerasNLP (with TensorFlow or PyTorch) and Transformers (with PyTorch).
Interactively debug LLM prompts using sequence salience.
Multiple salience methods (grad-l2 and grad-dot-input), at multiple granularities: token-, word-, line-, sentence-, and paragraph-level.

Tip: check out the in-depth walkthrough at https://ai.google.dev/responsible/model_behavior, part of the Responsible Generative AI Toolkit.

Hosted instance: https://pair-code.github.io/lit/demos/penguins.html
Code: examples/penguin/demo.py

Binary classification on penguin dataset.
Showing using of LIT on non-text data (numeric and categorical features).
Use partial-dependence plots to understand feature importance on individual examples, selections, or the entire evaluation dataset.
Use binary classifier threshold setters to find best thresholds for slices of examples to achieve specific fairness constraints, such as demographic parity.