llama: 51.5 hf-multimodal (pretrained=meta-llama/Llama-3.2-11B-Vision-Instruct), gen_kwargs: (None), limit: 50.0, num_fewshot: None, batch_size: 8
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| mathvista | 1 | extract_answer | 0 | acc | ↑ | 0.46 | ± | 0.0712 |
hf-multimodal (pretrained=llava-hf/llava-onevision-qwen2-7b-ov-chat-hf), gen_kwargs: (None), limit: 50.0, num_fewshot: None, batch_size: 8
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| mathvista | 1 | extract_answer | 0 | acc | ↑ | 0.62 | ± | 0.0693 |
hf-multimodal (pretrained=llava-hf/llava-onevision-qwen2-7b-ov-chat-hf,attn_implementation=flash_attention_2), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| chartqa | 0 | none | 0 | anywhere_accuracy | ↑ | 0.7136 | ± | 0.0090 |
| none | 0 | exact_match | ↑ | 0.5912 | ± | 0.0098 | ||
| none | 0 | relaxed_accuracy | ↑ | 0.6712 | ± | 0.0094 |
llama: 83.4 hf-multimodal (pretrained=meta-llama/Llama-3.2-11B-Vision-Instruct), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| chartqa_llama | 0 | none | 0 | anywhere_accuracy | ↑ | 0.7704 | ± | 0.0084 |
| none | 0 | exact_match | ↑ | 0.0776 | ± | 0.0054 | ||
| none | 0 | relaxed_accuracy | ↑ | 0.7132 | ± | 0.0090 |
llama: 91.1 hf-multimodal (pretrained=meta-llama/Llama-3.2-11B-Vision-Instruct), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| ai2d | 0 | flexible-extract | 0 | exact_match | ↑ | 0.7529 | ± | 0.0078 |
| strict-match | 0 | exact_match | ↑ | 0.5058 | ± | 0.0090 |