Getting Started

This is a tool to non-visually assess the uncertainty in MLLM-generated image descriptions by generating and presenting variations in text-based summaries.

See Examples: In Examples Tab, you can browse 9 examples with image, prompt, a list of 9 AI generateddescriptions, and variation summary and description.

Try it out: First, go to API Keys to input your API keys for OpenAI, Google Gemini, and Anthropic. Then, go to Try it out! Tab, upload your own image or provide an image URL or take a photo, enter a prompt describing what you want to know about the image, and click Submit. Then you can read the descriptions and variation summary and description.

The variation summary shows you:

  • Agreements: Claims that multiple models agree on
  • Disagreements: Claims where models differ
  • Unique Mentions: Information mentioned by only one model

You can customize varations by:

  • Models: Select which MLLMs to use (GPT, Gemini, Claude)
  • Trials: Set the number of times to query each model (default = 3)
  • Prompts: Choose between using original prompt only, or using paraphrased prompts, or persona-based prompts (default = original prompt only)