
The LIT team maintains a number of hosted demos, as well as pre-built launchers for some common tasks and model types.

For publicly-visible demos hosted on Google Cloud, see

Classification ΒΆ

Sentiment and NLI ΒΆ

Hosted instance:
Code: examples/glue/

  • Multi-task demo:

    • Sentiment analysis as a binary classification task (SST-2) on single sentences.

    • Natural Language Inference (NLI) using MultiNLI, as a three-way classification task with two-segment input (premise, hypothesis).

    • STS-B textual similarity task (see Regression / Scoring below).

    • Switch tasks using the Settings (βš™οΈ) menu.

  • BERT models of different sizes, built on HuggingFace TF2 (Keras).

  • Supports the widest range of LIT interpretability features:

    • Model output probabilities, custom thresholds, and multiclass metrics.

    • Jitter plot of output scores, to find confident examples or ones near the margin.

    • Embedding projector to find clusters in representation space.

    • Integrated Gradients, LIME, and other salience methods.

    • Counterfactual generators, including HotFlip for targeted adversarial perturbations.

Tip: check out a case study for this demo on the public LIT website:

Regression / Scoring ΒΆ

Textual Similarity (STS-B) ΒΆ

Hosted instance:
Code: examples/glue/

  • STS-B textual similarity task, predicting scores on a range from 0 (unrelated) to 5 (very similar).

  • BERT models built on HuggingFace TF2 (Keras).

  • Supports a wide range of LIT interpretability features:

    • Model output scores and metrics.

    • Scatter plot of scores and error, and jitter plot of true labels for quick filtering.

    • Embedding projector to find clusters in representation space.

    • Integrated Gradients, LIME, and other salience methods.

Sequence-to-Sequence ΒΆ

Gemma ΒΆ

Code: examples/prompt_debugging/

  • Supports Gemma 2B and 7B models using KerasNLP (with TensorFlow or PyTorch) and Transformers (with PyTorch).

  • Interactively debug LLM prompts using sequence salience.

  • Multiple salience methods (grad-l2 and grad-dot-input), at multiple granularities: token-, word-, line-, sentence-, and paragraph-level.

Tip: check out the in-depth walkthrough at, part of the Responsible Generative AI Toolkit.

Multimodal ΒΆ

Tabular Data: Penguin Classification ΒΆ

Hosted instance:
Code: examples/penguin/

  • Binary classification on penguin dataset.

  • Showing using of LIT on non-text data (numeric and categorical features).

  • Use partial-dependence plots to understand feature importance on individual examples, selections, or the entire evaluation dataset.

  • Use binary classifier threshold setters to find best thresholds for slices of examples to achieve specific fairness constraints, such as demographic parity.