albertbn/memorability_demo.md

## memorability_demo.md

      
    Raw
  

              memorability_demo.md
            
          
    Ad Memorability Scorer: Context-Aware Quality Prediction

Project Overview

Native advertising effectiveness depends critically on both the intrinsic quality of the ad creative and its contextual fit with the surrounding article. However, automated quality assessment remains challenging, relying primarily on post-hoc engagement metrics or expensive manual review. This project develops a machine learning model that predicts ad memorability scores (1-10 scale) by combining multimodal creative analysis with contextual alignment features.
We trained an XGBoost regression model on 29,032 ad-article pairs scored by GPT-5.2 vision, achieving strong performance (MAE=0.605, R²=0.785). The model extracts 21 interpretable features spanning visual attention, copy quality, and contextual fit - enabling real-time quality prediction for ad placement optimization.
The key innovation is the decomposition of memorability into intrinsic ad properties (M features) and contextual fit metrics (F features), allowing the model to evaluate both "is this ad good?" and "does it fit this article?" simultaneously.

Model Architecture

Feature Extraction Pipeline

┌─────────────┐
│  Ad Image   │──────────┐
└─────────────┘          │
                         ▼
                  ┌──────────────────┐
                  │  GPT-5.2 Vision  │──► 15 M Features
                  │  + CLIP (512-d)  │    (clarity, faces, OCR, twist...)
                  └──────────────────┘
┌─────────────┐          │
│  Article    │──────────┤
│  Text       │          │
└─────────────┘          ▼
                  ┌──────────────────┐
                  │  GPT-5.2 Text +  │──► 4 Article Features
                  │  Embeddings      │    (topic, entities, sentiment, arousal)
                  └──────────────────┘
                         │
                         ▼
                  ┌──────────────────┐
                  │  Compute Pair    │──► 6 F Features
                  │  Features (F)    │    (similarity, overlap, alignment...)
                  └──────────────────┘
                         │
                         ▼
                  ┌──────────────────┐
                  │  XGBoost         │──► Score: 1-10
                  │  Regressor       │
                  └──────────────────┘

Mathematical Formulation

Overall Scoring Function:
Score(1-10) = f(M, F)

where:

M = {m₁, m₂, ..., m₁₅} ∈ ℝ¹⁵ represents Memorability features (ad intrinsic quality)
F = {f₁, f₂, ..., f₆} ∈ ℝ⁶ represents Contextual Fit features (ad-article alignment)

Key Pair Features:
Cosine similarity between embeddings:
f_sim_text = (e_ad · e_article) / (||e_ad|| × ||e_article||)

Entity overlap (Jaccard similarity):
f_entity_overlap = |E_ad ∩ E_article| / |E_ad ∪ E_article|

Sentiment alignment:
f_sentiment_align = 1 - |v_ad - v_article|  where v ∈ [-1, 1]

Teacher Scoring Criteria (Reference Framework):
High Quality (8-10):

Memorable & original (clever twist)
Clear in < 2 seconds
Emotionally engaging
Contextually relevant

Mediocre (4-6):

Generic, cliché
Takes > 2 seconds to understand
Forgettable

Low Quality (1-3):

Boring, confusing
Irrelevant to context
Poor fit


Training Dataset

Dataset Statistics


Metric
Value


Unique Ads
7,258


Synthetic Articles
7,258


Total Pairs
29,032


- Positive pairs
7,258 (25%)


- Random negatives
14,516 (50%)


- Safe contrast
7,258 (25%)


Features
21 (15 M + 6 F)


Teacher Scores (GPT-5.2)
29,032


Dataset Size
8.1 GB


Feature Breakdown

M Features (15) - Ad Intrinsic Quality:

is_ad - Classification confidence (0-1)
ocr_word_count - Visible text word count
ocr_legible - Text legibility (0-1)
face_count - Number of faces
face_emotion - Dominant emotion or "none"
clarity - Message clarity at a glance (0-1)
clutter - Visual clutter (0-1, lower=cleaner)
contrast_m - Visual contrast (0-1)
color_palette_size - Count of dominant colors
copy_word_count - Ad copy word count
copy_concrete - Concreteness (0-1)
copy_emotion_valence - Emotional tone (-1 to 1)
copy_arousal - Emotional intensity (0-1)
twist_present - Has clever twist (bool)
twist_resolves_fast - Twist resolves quickly (0-1)

F Features (6) - Contextual Fit:

sim_adtext_article - Cosine similarity (ad text ↔ article)
sim_adimage_article - Cosine similarity (ad image ↔ article)
entity_overlap_rate - Jaccard similarity of entities
sentiment_alignment - 1 - |ad_valence - article_valence|
topic_match - 1.0 if topics match, else 0.0
contrast - 1 - sim_adtext_article

Score Distribution

By Pair Type:

Positive pairs: μ = 4.77, σ = 2.25 (range: 1.2-9.3)
Random negatives: μ = 3.01, σ = 1.34 (range: 1.0-8.6)
Safe contrast: μ = 3.00, σ = 1.31 (range: 1.0-8.2)

Key Observation: Teacher model correctly separates positive pairs from negatives with ~1.8 point gap, demonstrating reliable ground truth labels.

Model Performance

Test Set Metrics (80/20 Split, Ad-Level Grouping)


Metric
Value


MAE (Mean Absolute Error)
0.605


RMSE
0.842


R²
0.785


Interpretation:

MAE of 0.605 means predictions are within ±0.6 points on average (6% error on 10-point scale)
R² of 0.785 indicates model explains 78.5% of score variance
These metrics are strong for ad quality prediction where human judgments are inherently noisy

Performance by Pair Type


Pair Type
MAE
Mean True Score
Mean Predicted


Positive
0.682
4.93
4.67


Random Negative
0.587
3.09
3.14


Safe Contrast
0.562
3.05
3.10


Key Takeaway: Model maintains ~1.6 point separation between positive and negative pairs, preserving ranking quality for ad placement optimization.
Top 10 Feature Importances


contrast (32.9%) - Inverse text similarity; ads benefit from differentiation
sim_adtext_article (24.0%) - Text-article semantic alignment
is_ad (15.3%) - Ad classification confidence
clarity (10.4%) - Message clarity at a glance
twist_resolves_fast (3.2%) - Creative twist resolution speed
entity_overlap_rate (1.8%) - Shared entities (brands, products, people)
topic_match (1.7%) - Coarse topic alignment
twist_present (1.7%) - Creative twist presence
copy_word_count (1.2%) - Ad copy length
clutter (1.1%) - Visual clutter level

Insight: Contextual fit features (contrast, similarity, entity overlap) dominate importance (58.7% combined), validating the M+F decomposition approach.

Example Predictions

Example 1: Best Score - Tesla × Sports Cars Article

Predicted Score: 5.67/10

Ad Copy: "It takes 3.1 seconds to read this ad. The same time it takes a Model S to go from 0 to 60mph. TESLA"
Article Context:

"Why Do Men Like Sports Cars So Much?"
It's become almost a cliche over the years…men love sports cars and high-performance cars. There's a feeling that goes along with punching a gas pedal...

Feature Highlights:

M Features: clarity=0.92, twist_present=true, twist_resolves_fast=0.95, copy_concrete=0.90
F Features: sim_adtext_article=0.24, sentiment_alignment=0.95, contrast=0.76
Why it scores highest: Creative twist linking reading time to 0-60mph acceleration, high clarity, concrete copy, aligns with sports car theme


Example 2: Tesla × Blog Niche Article

Predicted Score: 4.94/10

Ad Copy: "It takes 3.1 seconds to read this ad. The same time it takes a Model S to go from 0 to 60mph. TESLA"
Article Context:

"120 profitable blog niche ideas and how to pick the right one"
So, you want to create a blog that will draw in tons of readers and eventually make you a profit...

Feature Highlights:

M Features: clarity=0.95, twist_present=true, copy_concrete=0.90
F Features: sim_adtext_article=0.09, topic_match=0.0, contrast=0.91
Why mid-range: Strong ad creative but weak contextual fit (blogging ≠ cars), low similarity scores


Example 3: Medical Scrubs × Netflix Article

Predicted Score: 3.99/10

Ad Copy: "AS COMFORTABLE AS YOUR FAVORITE SHOW SHOP SCRUBS UA Uniform Advantage"
Article Context:

"5 Best Netflix Shows to Watch on New Year's Day: 'Stranger Things' and More"
So you don't have any plans on New Year's Day — that's not a big deal! There are plenty of shows to keep you company...

Feature Highlights:

M Features: clarity=0.92, twist_present=false, copy_arousal=0.28
F Features: sim_adtext_article=0.13, entity_overlap=0.0, topic_match=0.0
Why it underperforms: Metaphorical connection ("comfortable as show") doesn't translate to strong contextual fit, no entities overlap, generic positioning


Example 4: Lowest Score - Medical Scrubs × Neuralink Article

Predicted Score: 3.77/10

Ad Copy: "AS COMFORTABLE AS YOUR FAVORITE SHOW SHOP SCRUBS UA Uniform Advantage"
Article Context:

"Elon Musk plans 'high-volume production' of Neuralink brain chips..."
Elon Musk's Neuralink startup develops brain-chip implants...

Feature Highlights:

M Features: clarity=0.95, twist_present=false, copy_arousal=0.25
F Features: sim_adtext_article=-0.03 (negative!), entity_overlap=0.0, topic_match=0.0
Why it fails: Complete contextual mismatch (medical scrubs ≠ brain chips), negative text similarity, no shared entities, sentiment misalignment


Implementation Notes

Model Architecture


Algorithm: XGBoost Regressor
Hyperparameters: 215 trees (early stopped from max 300), depth=6, learning_rate=0.05
Regularization: L1=1, L2=10, min_child_weight=10
Training: 80/20 split with GroupShuffleSplit on ad_id (prevents data leakage)

Feature Extraction


GPT-5.2 Vision API: M features extraction from ad images (downsampled to 1024×1024, JPEG q=85)
GPT-5.2 Text API: Article features (topic, entities, sentiment, arousal)
CLIP: openai/clip-vit-base-patch32 for 512-dim image embeddings
Text Embeddings: text-embedding-3-small (512-dim) for ad text and articles
Timeouts: 120s per API call with automatic retry logic

Cost Analysis


Full feature extraction: ~$0.02 per ad-article pair (2× GPT-5.2 calls + embeddings)
Inference only (with cached features): ~$0.0001 per pair (model prediction only)
Training dataset cost: Several hundred USD for 29K pairs

Dataset Availability


HuggingFace: albertbn/ad-memorability-scorer-v0 (private dataset)
Contents: Consolidated parquet file (8.1 GB) with images as binary bytes + all features + scores
Access: Contact for permissions


Future Directions

Near-Term Improvements


Fine-tune on real engagement data: Replace teacher scores with actual CTR/viewability metrics
Add brand safety dimension (S factor): Extend to full M × F × S formula from original concept
Multi-task learning: Jointly predict score + generate natural language reasoning
Feature ablation studies: Identify minimal feature set for real-time inference

Long-Term Research


Video ad support: Extend to temporal/motion features for video creatives
Pairwise ranking objective: Switch from regression to learning-to-rank for better separation
Multimodal fusion: End-to-end neural model replacing hand-crafted features
Personalization: Condition on user demographics/interests for individualized scoring


Citation

If you use this dataset or model, please cite:
@dataset{albertbn_ad_memorability_v0,
  title={Ad Memorability Scorer: Context-Aware Quality Prediction},
  author={FrameAI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/datasets/albertbn/ad-memorability-scorer-v0}
}

Contact: For dataset access, collaboration, or questions about this work, please reach out via GitHub or HuggingFace.

Generated with GPT-5.2 teacher scoring and XGBoost baseline regression • January 2026
Metric	Value
Unique Ads	7,258
Synthetic Articles	7,258
Total Pairs	29,032
- Positive pairs	7,258 (25%)
- Random negatives	14,516 (50%)
- Safe contrast	7,258 (25%)
Features	21 (15 M + 6 F)
Teacher Scores (GPT-5.2)	29,032
Dataset Size	8.1 GB
Metric	Value
MAE (Mean Absolute Error)	0.605
RMSE	0.842
R²	0.785
Pair Type	MAE	Mean True Score	Mean Predicted
Positive	0.682	4.93	4.67
Random Negative	0.587	3.09	3.14
Safe Contrast	0.562	3.05	3.10