# Demos The LIT team maintains a number of hosted demos, as well as pre-built launchers for some common tasks and model types. For publicly-visible demos hosted on Google Cloud, see https://pair-code.github.io/lit/demos/. -------------------------------------------------------------------------------- ## Classification ### Sentiment and NLI **Hosted instance:** https://pair-code.github.io/lit/demos/glue.html \ **Code:** [examples/glue/demo.py](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py) * Multi-task demo: * Sentiment analysis as a binary classification task ([SST-2](https://nlp.stanford.edu/sentiment/treebank.html)) on single sentences. * Natural Language Inference (NLI) using [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/), as a three-way classification task with two-segment input (premise, hypothesis). * STS-B textual similarity task (see [Regression / Scoring](#regression-scoring) below). * Switch tasks using the Settings (⚙️) menu. * BERT models of different sizes, built on HuggingFace TF2 (Keras). * Supports the widest range of LIT interpretability features: * Model output probabilities, custom thresholds, and multiclass metrics. * Jitter plot of output scores, to find confident examples or ones near the margin. * Embedding projector to find clusters in representation space. * Integrated Gradients, LIME, and other salience methods. * Counterfactual generators, including HotFlip for targeted adversarial perturbations. Tip: check out a case study for this demo on the public LIT website: https://pair-code.github.io/lit/tutorials/sentiment -------------------------------------------------------------------------------- ## Regression / Scoring ### Textual Similarity (STS-B) **Hosted instance:** https://pair-code.github.io/lit/demos/glue.html?models=stsb&dataset=stsb_dev \ **Code:** [examples/glue/demo.py](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py) * STS-B textual similarity task, predicting scores on a range from 0 (unrelated) to 5 (very similar). * BERT models built on HuggingFace TF2 (Keras). * Supports a wide range of LIT interpretability features: * Model output scores and metrics. * Scatter plot of scores and error, and jitter plot of true labels for quick filtering. * Embedding projector to find clusters in representation space. * Integrated Gradients, LIME, and other salience methods. -------------------------------------------------------------------------------- ## Sequence-to-Sequence ### Gemma **Code:** [examples/prompt_debugging/server.py](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/server.py) * Supports Gemma 2B and 7B models using KerasNLP (with TensorFlow or PyTorch) and Transformers (with PyTorch). * Interactively debug LLM prompts using [sequence salience](./components.md#sequence-salience). * Multiple salience methods (grad-l2 and grad-dot-input), at multiple granularities: token-, word-, line-, sentence-, and paragraph-level. Tip: check out the in-depth walkthrough at https://ai.google.dev/responsible/model_behavior, part of the Responsible Generative AI Toolkit. -------------------------------------------------------------------------------- ## Multimodal ### Tabular Data: Penguin Classification **Hosted instance:** https://pair-code.github.io/lit/demos/penguins.html \ **Code:** [examples/penguin/demo.py](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/demo.py) * Binary classification on [penguin dataset](https://www.tensorflow.org/datasets/catalog/penguins). * Showing using of LIT on non-text data (numeric and categorical features). * Use partial-dependence plots to understand feature importance on individual examples, selections, or the entire evaluation dataset. * Use binary classifier threshold setters to find best thresholds for slices of examples to achieve specific fairness constraints, such as demographic parity.