Skip to content

Instantly share code, notes, and snippets.

@baberabb
Created February 6, 2025 13:55
Show Gist options
  • Select an option

  • Save baberabb/e3b805c5d75e3c9e51d5d28996aa23f7 to your computer and use it in GitHub Desktop.

Select an option

Save baberabb/e3b805c5d75e3c9e51d5d28996aa23f7 to your computer and use it in GitHub Desktop.

Mathvista 100 samples

llama: 51.5 hf-multimodal (pretrained=meta-llama/Llama-3.2-11B-Vision-Instruct), gen_kwargs: (None), limit: 50.0, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
mathvista 1 extract_answer 0 acc 0.46 ± 0.0712

hf-multimodal (pretrained=llava-hf/llava-onevision-qwen2-7b-ov-chat-hf), gen_kwargs: (None), limit: 50.0, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
mathvista 1 extract_answer 0 acc 0.62 ± 0.0693

ChartQA

hf-multimodal (pretrained=llava-hf/llava-onevision-qwen2-7b-ov-chat-hf,attn_implementation=flash_attention_2), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
chartqa 0 none 0 anywhere_accuracy 0.7136 ± 0.0090
none 0 exact_match 0.5912 ± 0.0098
none 0 relaxed_accuracy 0.6712 ± 0.0094

llama: 83.4 hf-multimodal (pretrained=meta-llama/Llama-3.2-11B-Vision-Instruct), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
chartqa_llama 0 none 0 anywhere_accuracy 0.7704 ± 0.0084
none 0 exact_match 0.0776 ± 0.0054
none 0 relaxed_accuracy 0.7132 ± 0.0090

ai2d

llama: 91.1 hf-multimodal (pretrained=meta-llama/Llama-3.2-11B-Vision-Instruct), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
ai2d 0 flexible-extract 0 exact_match 0.7529 ± 0.0078
strict-match 0 exact_match 0.5058 ± 0.0090
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment