Skip to content

Instantly share code, notes, and snippets.

@albertbn
Last active January 22, 2026 13:59
Show Gist options
  • Select an option

  • Save albertbn/c6d54cb7ffcd89ddbf8d673383966fcb to your computer and use it in GitHub Desktop.

Select an option

Save albertbn/c6d54cb7ffcd89ddbf8d673383966fcb to your computer and use it in GitHub Desktop.
Ad Memorability Scorer: Context-Aware Quality Prediction

Ad Memorability Scorer: Context-Aware Quality Prediction

Project Overview

Native advertising effectiveness depends critically on both the intrinsic quality of the ad creative and its contextual fit with the surrounding article. However, automated quality assessment remains challenging, relying primarily on post-hoc engagement metrics or expensive manual review. This project develops a machine learning model that predicts ad memorability scores (1-10 scale) by combining multimodal creative analysis with contextual alignment features.

We trained an XGBoost regression model on 29,032 ad-article pairs scored by GPT-5.2 vision, achieving strong performance (MAE=0.605, R²=0.785). The model extracts 21 interpretable features spanning visual attention, copy quality, and contextual fit - enabling real-time quality prediction for ad placement optimization.

The key innovation is the decomposition of memorability into intrinsic ad properties (M features) and contextual fit metrics (F features), allowing the model to evaluate both "is this ad good?" and "does it fit this article?" simultaneously.


Model Architecture

Feature Extraction Pipeline

┌─────────────┐
│  Ad Image   │──────────┐
└─────────────┘          │
                         ▼
                  ┌──────────────────┐
                  │  GPT-5.2 Vision  │──► 15 M Features
                  │  + CLIP (512-d)  │    (clarity, faces, OCR, twist...)
                  └──────────────────┘
┌─────────────┐          │
│  Article    │──────────┤
│  Text       │          │
└─────────────┘          ▼
                  ┌──────────────────┐
                  │  GPT-5.2 Text +  │──► 4 Article Features
                  │  Embeddings      │    (topic, entities, sentiment, arousal)
                  └──────────────────┘
                         │
                         ▼
                  ┌──────────────────┐
                  │  Compute Pair    │──► 6 F Features
                  │  Features (F)    │    (similarity, overlap, alignment...)
                  └──────────────────┘
                         │
                         ▼
                  ┌──────────────────┐
                  │  XGBoost         │──► Score: 1-10
                  │  Regressor       │
                  └──────────────────┘

Mathematical Formulation

Overall Scoring Function:

Score(1-10) = f(M, F)

where:

  • M = {m₁, m₂, ..., m₁₅} ∈ ℝ¹⁵ represents Memorability features (ad intrinsic quality)
  • F = {f₁, f₂, ..., f₆} ∈ ℝ⁶ represents Contextual Fit features (ad-article alignment)

Key Pair Features:

Cosine similarity between embeddings:

f_sim_text = (e_ad · e_article) / (||e_ad|| × ||e_article||)

Entity overlap (Jaccard similarity):

f_entity_overlap = |E_ad ∩ E_article| / |E_ad ∪ E_article|

Sentiment alignment:

f_sentiment_align = 1 - |v_ad - v_article|  where v ∈ [-1, 1]

Teacher Scoring Criteria (Reference Framework):

High Quality (8-10):

  • Memorable & original (clever twist)
  • Clear in < 2 seconds
  • Emotionally engaging
  • Contextually relevant

Mediocre (4-6):

  • Generic, cliché
  • Takes > 2 seconds to understand
  • Forgettable

Low Quality (1-3):

  • Boring, confusing
  • Irrelevant to context
  • Poor fit

Training Dataset

Dataset Statistics

Metric Value
Unique Ads 7,258
Synthetic Articles 7,258
Total Pairs 29,032
- Positive pairs 7,258 (25%)
- Random negatives 14,516 (50%)
- Safe contrast 7,258 (25%)
Features 21 (15 M + 6 F)
Teacher Scores (GPT-5.2) 29,032
Dataset Size 8.1 GB

Feature Breakdown

M Features (15) - Ad Intrinsic Quality:

  1. is_ad - Classification confidence (0-1)
  2. ocr_word_count - Visible text word count
  3. ocr_legible - Text legibility (0-1)
  4. face_count - Number of faces
  5. face_emotion - Dominant emotion or "none"
  6. clarity - Message clarity at a glance (0-1)
  7. clutter - Visual clutter (0-1, lower=cleaner)
  8. contrast_m - Visual contrast (0-1)
  9. color_palette_size - Count of dominant colors
  10. copy_word_count - Ad copy word count
  11. copy_concrete - Concreteness (0-1)
  12. copy_emotion_valence - Emotional tone (-1 to 1)
  13. copy_arousal - Emotional intensity (0-1)
  14. twist_present - Has clever twist (bool)
  15. twist_resolves_fast - Twist resolves quickly (0-1)

F Features (6) - Contextual Fit:

  1. sim_adtext_article - Cosine similarity (ad text ↔ article)
  2. sim_adimage_article - Cosine similarity (ad image ↔ article)
  3. entity_overlap_rate - Jaccard similarity of entities
  4. sentiment_alignment - 1 - |ad_valence - article_valence|
  5. topic_match - 1.0 if topics match, else 0.0
  6. contrast - 1 - sim_adtext_article

Score Distribution

By Pair Type:

  • Positive pairs: μ = 4.77, σ = 2.25 (range: 1.2-9.3)
  • Random negatives: μ = 3.01, σ = 1.34 (range: 1.0-8.6)
  • Safe contrast: μ = 3.00, σ = 1.31 (range: 1.0-8.2)

Key Observation: Teacher model correctly separates positive pairs from negatives with ~1.8 point gap, demonstrating reliable ground truth labels.


Model Performance

Test Set Metrics (80/20 Split, Ad-Level Grouping)

Metric Value
MAE (Mean Absolute Error) 0.605
RMSE 0.842
0.785

Interpretation:

  • MAE of 0.605 means predictions are within ±0.6 points on average (6% error on 10-point scale)
  • R² of 0.785 indicates model explains 78.5% of score variance
  • These metrics are strong for ad quality prediction where human judgments are inherently noisy

Performance by Pair Type

Pair Type MAE Mean True Score Mean Predicted
Positive 0.682 4.93 4.67
Random Negative 0.587 3.09 3.14
Safe Contrast 0.562 3.05 3.10

Key Takeaway: Model maintains ~1.6 point separation between positive and negative pairs, preserving ranking quality for ad placement optimization.

Top 10 Feature Importances

  1. contrast (32.9%) - Inverse text similarity; ads benefit from differentiation
  2. sim_adtext_article (24.0%) - Text-article semantic alignment
  3. is_ad (15.3%) - Ad classification confidence
  4. clarity (10.4%) - Message clarity at a glance
  5. twist_resolves_fast (3.2%) - Creative twist resolution speed
  6. entity_overlap_rate (1.8%) - Shared entities (brands, products, people)
  7. topic_match (1.7%) - Coarse topic alignment
  8. twist_present (1.7%) - Creative twist presence
  9. copy_word_count (1.2%) - Ad copy length
  10. clutter (1.1%) - Visual clutter level

Insight: Contextual fit features (contrast, similarity, entity overlap) dominate importance (58.7% combined), validating the M+F decomposition approach.


Example Predictions

Example 1: Best Score - Tesla × Sports Cars Article

Predicted Score: 5.67/10

Tesla Ad

Ad Copy: "It takes 3.1 seconds to read this ad. The same time it takes a Model S to go from 0 to 60mph. TESLA"

Article Context:

"Why Do Men Like Sports Cars So Much?"

It's become almost a cliche over the years…men love sports cars and high-performance cars. There's a feeling that goes along with punching a gas pedal...

Feature Highlights:

  • M Features: clarity=0.92, twist_present=true, twist_resolves_fast=0.95, copy_concrete=0.90
  • F Features: sim_adtext_article=0.24, sentiment_alignment=0.95, contrast=0.76
  • Why it scores highest: Creative twist linking reading time to 0-60mph acceleration, high clarity, concrete copy, aligns with sports car theme

Example 2: Tesla × Blog Niche Article

Predicted Score: 4.94/10

Tesla Ad

Ad Copy: "It takes 3.1 seconds to read this ad. The same time it takes a Model S to go from 0 to 60mph. TESLA"

Article Context:

"120 profitable blog niche ideas and how to pick the right one"

So, you want to create a blog that will draw in tons of readers and eventually make you a profit...

Feature Highlights:

  • M Features: clarity=0.95, twist_present=true, copy_concrete=0.90
  • F Features: sim_adtext_article=0.09, topic_match=0.0, contrast=0.91
  • Why mid-range: Strong ad creative but weak contextual fit (blogging ≠ cars), low similarity scores

Example 3: Medical Scrubs × Netflix Article

Predicted Score: 3.99/10

Scrubs Ad

Ad Copy: "AS COMFORTABLE AS YOUR FAVORITE SHOW SHOP SCRUBS UA Uniform Advantage"

Article Context:

"5 Best Netflix Shows to Watch on New Year's Day: 'Stranger Things' and More"

So you don't have any plans on New Year's Day — that's not a big deal! There are plenty of shows to keep you company...

Feature Highlights:

  • M Features: clarity=0.92, twist_present=false, copy_arousal=0.28
  • F Features: sim_adtext_article=0.13, entity_overlap=0.0, topic_match=0.0
  • Why it underperforms: Metaphorical connection ("comfortable as show") doesn't translate to strong contextual fit, no entities overlap, generic positioning

Example 4: Lowest Score - Medical Scrubs × Neuralink Article

Predicted Score: 3.77/10

Scrubs Ad

Ad Copy: "AS COMFORTABLE AS YOUR FAVORITE SHOW SHOP SCRUBS UA Uniform Advantage"

Article Context:

"Elon Musk plans 'high-volume production' of Neuralink brain chips..."

Elon Musk's Neuralink startup develops brain-chip implants...

Feature Highlights:

  • M Features: clarity=0.95, twist_present=false, copy_arousal=0.25
  • F Features: sim_adtext_article=-0.03 (negative!), entity_overlap=0.0, topic_match=0.0
  • Why it fails: Complete contextual mismatch (medical scrubs ≠ brain chips), negative text similarity, no shared entities, sentiment misalignment

Implementation Notes

Model Architecture

  • Algorithm: XGBoost Regressor
  • Hyperparameters: 215 trees (early stopped from max 300), depth=6, learning_rate=0.05
  • Regularization: L1=1, L2=10, min_child_weight=10
  • Training: 80/20 split with GroupShuffleSplit on ad_id (prevents data leakage)

Feature Extraction

  • GPT-5.2 Vision API: M features extraction from ad images (downsampled to 1024×1024, JPEG q=85)
  • GPT-5.2 Text API: Article features (topic, entities, sentiment, arousal)
  • CLIP: openai/clip-vit-base-patch32 for 512-dim image embeddings
  • Text Embeddings: text-embedding-3-small (512-dim) for ad text and articles
  • Timeouts: 120s per API call with automatic retry logic

Cost Analysis

  • Full feature extraction: ~$0.02 per ad-article pair (2× GPT-5.2 calls + embeddings)
  • Inference only (with cached features): ~$0.0001 per pair (model prediction only)
  • Training dataset cost: Several hundred USD for 29K pairs

Dataset Availability

  • HuggingFace: albertbn/ad-memorability-scorer-v0 (private dataset)
  • Contents: Consolidated parquet file (8.1 GB) with images as binary bytes + all features + scores
  • Access: Contact for permissions

Future Directions

Near-Term Improvements

  1. Fine-tune on real engagement data: Replace teacher scores with actual CTR/viewability metrics
  2. Add brand safety dimension (S factor): Extend to full M × F × S formula from original concept
  3. Multi-task learning: Jointly predict score + generate natural language reasoning
  4. Feature ablation studies: Identify minimal feature set for real-time inference

Long-Term Research

  1. Video ad support: Extend to temporal/motion features for video creatives
  2. Pairwise ranking objective: Switch from regression to learning-to-rank for better separation
  3. Multimodal fusion: End-to-end neural model replacing hand-crafted features
  4. Personalization: Condition on user demographics/interests for individualized scoring

Citation

If you use this dataset or model, please cite:

@dataset{albertbn_ad_memorability_v0,
  title={Ad Memorability Scorer: Context-Aware Quality Prediction},
  author={FrameAI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/datasets/albertbn/ad-memorability-scorer-v0}
}

Contact: For dataset access, collaboration, or questions about this work, please reach out via GitHub or HuggingFace.


Generated with GPT-5.2 teacher scoring and XGBoost baseline regression • January 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment