Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save victorhg/59378851ae7956b65ab073dc7dbebf86 to your computer and use it in GitHub Desktop.

Select an option

Save victorhg/59378851ae7956b65ab073dc7dbebf86 to your computer and use it in GitHub Desktop.
Reading materials for class AAI520

Natural Language Processing and GenAI - AAI 520

Livro Texto: Speech and Language Processing - Dan Jurafsky and James H. Martin

Reading Resources and Modules

Module 1 :

Textbook: Speech and Language Processing —3rd ed. (Book website). Jurafsky, D., & Martin, J. H. (2025)

  • Read Chapter 2: Regular Expressions, Tokenization, Edit Distance Links to an external site. pages 13 to 29 (Sections 2.2 - 2.8):
    • Section 2.2: Words
    • Section 2.3: Corpora
    • Section 2.4: Simple Unix Tools for Word Tokenization
    • Section 2.5: Word and Subword Tokenization
    • Section 2.6: Word Normalization, Lemmatization and Stemming
    • Section 2.7: Sentence Segmentation
    • Section 2.8: Minimum Edit Distance

Articles:

Required Media

Module 2: Named Entity Recognition (NER) and Part-of-Speech (PoS) Tagging

Textbook: Speech and Language Processing .—3rd ed. (Book website). Jurafsky, D., & Martin, J. H. (2025)

Articles:

Read the Linguistic Features-Entity Linking Read the Part of Speech Tagging for Beginners Read the Understanding Named Entity Recognition: What Is It And How To Use It In Natural Language Processing?

Required Media

Watch the Text processing, POS tagging, and Named entity recognition - Part 2

Module 3: Transformers

Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural Language Processing with Transformers. O'Reilly Media

  • Read Chapter 1 Hello Transformers pages 1 to 14.
  • Read Chapter 3 Transformer Anatomy pages 57 to 75.

Textbook: Speech and Language Processing .—3rd ed. (Book website). Jurafsky, D., & Martin, J. H. (2025)

  • Read Chapter 9 The Transformer Links to an external site. pages 184 to 200 (Sections 9.1- 9.5):
    • 9.1 Attention
    • 9.2 Transformer Blocks
    • 9.3 Parallelizing computation using a single matrix
    • 9.4 The input: embeddings for token and position
    • 9.5 The Language Modeling Head

Articles:

Required Media

Module 4: Large Language Models(LLMs)

Textbook: Speech and Language Processing —3rd ed. (Book website).

  • Read Chapter 10 Large Language Models pages 203 to 219 (Sections 10.1-10.6):
    • 10.1 Large Language Models with Transformers
    • 10.2 Sampling for LLM Generation
    • 10.3 Pretraining Large Language Models
    • 10.4 Evaluating Large Language Models
    • 10.5 Dealing with Scale
    • 10.6 Potential Harms from Language Models

Articles:

Required Media

Module 5: Prompt Engineering

Textbook: Speech and Language Processing—3rd ed. (Book website).

  • Read Chapter 12: Model Alignment, Prompting, and In-Context Learning pages 242 to 258 (Sections 12.1-12.7):
    • 12.1 Prompting
    • 12.2 Post-training and Model Alignment
    • 12.3 Model Alignment: Instruction Tuning
    • 12.4 Chain-of-Thought Prompting
    • 12.5 Automatic Prompt Optimization
    • 12.6 Evaluating Prompted Language Models
    • 12.7 Model Alignment with Human Preferences: RLHF and DPO

Articles:

Required Media

Watch the Prompt Engineering Tutorial (41:36) video

Module 6: Building solutions with Hugging Face

Textbook: Speech and Language Processing —3rd ed. (Book website).

Articles:

Required Media

Watch the What is HuggingFace? Watch the What is Retrieval-Augmented Generation (RAG)? Watch the RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models Watch the LoRA explained (and a bit about precision and quantization)

Module 7: Agentic. AI

Required Readings

Required Media

Watch the New Hugging Face Agents - Full Tutorial (24:37)

Recommended Readings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment