Skip to content

Instantly share code, notes, and snippets.

@tanaypratap
Created October 5, 2025 12:08
Show Gist options
  • Select an option

  • Save tanaypratap/2f6d09150e5aae40cb171a5b1b1df38a to your computer and use it in GitHub Desktop.

Select an option

Save tanaypratap/2f6d09150e5aae40cb171a5b1b1df38a to your computer and use it in GitHub Desktop.
HWs for Day 2: Training a deep neural network

πŸ“š Homework β€” Explore, Observe, Explain

Goal: tweak small things, watch outcomes, and write short notes on what you observe. No extra code required beyond your lab notebook.


1) Reproducibility

  • What to change: Set torch.manual_seed(0) before model init and training.
  • What to observe: Does the loss trajectory stay the same across reruns?
  • Deliverable: One sentence on why fixed seeds matter.

2) Hidden Size Sweep

  • What to change: hidden_dim in IrisClassifier β†’ try 8, 16, 32, 64.
  • What to observe: Final loss after 200 epochs. Convergence speed.
  • Deliverable: A tiny table (hidden_dim vs final loss) and one-sentence takeaway.

3) Learning Rate Sweep

  • What to change: learning_rate in optimizer β†’ try 0.001, 0.01, 0.05, 0.1.
  • What to observe: Does loss decrease smoothly or oscillate/diverge?
  • Deliverable: Name the best LR for your runs and why.

4) Training Duration

  • What to change: num_epochs β†’ try 50, 100, 200, 400.
  • What to observe: Diminishing returns. When does the curve β€œflatten”?
  • Deliverable: One sentence on when you’d stop training and why.

5) Activation Function

  • What to change: Replace nn.ReLU() with nn.GELU() or nn.Tanh().
  • What to observe: Any change in final loss or convergence speed?
  • Deliverable: One-sentence comparison.

6) Weight Decay (L2)

  • What to change: In create_loss_and_optimizer, set weight_decay to 0.0, 1e-4, 1e-3.
  • What to observe: Subtle change in loss; weights slightly smaller (inspect weight.norm() if curious).
  • Deliverable: One sentence on whether L2 helped or not.

7) Batch vs Full-Batch (Optional)

  • What to do: Wrap data in a DataLoader with batch_size=16 and train for the same epochs.
  • What to observe: Noise in loss per step, but similar final loss.
  • Deliverable: One sentence on pros/cons of mini-batching.

8) Weight Initialization Peek (Optional)

  • What to do: Print a slice of fully_connected_layer1.weight before and after training.
  • What to observe: Small random numbers β†’ shifted after learning.
  • Deliverable: One sentence explaining what changed and why.

🎁 Bonus: Add One More Hidden Layer

Goal: Extend the model to 4 β†’ 16 β†’ 8 β†’ 3.

Hints (no full code):

  • Add another nn.Linear after the first ReLU:
    • self.fully_connected_layer2 = nn.Linear(16, 8)
    • self.output_layer = nn.Linear(8, 3)
  • Update forward:
    • x = self.fully_connected_layer1(input_features) β†’ ReLU
    • x = self.fully_connected_layer2(x) β†’ ReLU
    • logits = self.output_layer(x)
  • Keep everything else the same (loss, optimizer, loop).
  • What to observe: Compare final loss vs the single-hidden-layer model. Any faster convergence? Overfitting signs?

Deliverable: 2–3 sentences reflecting on whether the extra layer helped and under what setting (hidden sizes, learning rate).


Submission

  • Do it either in colab and send screenshot or in a lab notebook (remember labs in school) and send a pic.
  • Short notes under each item.
  • Keep outputs concise. Focus on observations and one-line explanations.

πŸ“˜ Day2.HW2 β€” Train/Validation Split + Save & Load + Validate

⚠️ Important:
Do this homework in a fresh new Colab notebook β€” do not use your Lesson-1 classwork notebook.
This helps you start clean and keeps class code separate from your experiments.


🎯 Objectives

  1. Add a train/validation split with a configurable ratio.
  2. Train on the training split only.
  3. Save the trained model to disk.
  4. Load the model from disk in a new instance.
  5. Validate the loaded model on the held-out validation set and report accuracy.

πŸ“ Part 1 β€” Train/Validation Split (Configurable)

  • Create a helper function:

    def split_train_val(features_tensor, labels_tensor, val_ratio=0.2, seed=0):
        ...
    
  • Use sklearn.model_selection.train_test_split directly on the PyTorch tensors (no need to convert to NumPy).

  • Pass val_ratio as the test_size argument so you can call:

    X_train, X_val, y_train, y_val = split_train_val(features_tensor, labels_tensor, val_ratio=0.3)
  • Keep a fixed seed for reproducibility.


πŸ“ Part 2 β€” Train on Train-Split & Save the Model

  • Re-instantiate your IrisClassifier and train only on (X_train, y_train).

  • After training, save the trained weights:

    torch.save(model.state_dict(), "iris_model.pt")

πŸ“ Part 3 β€” Load & Validate

  1. Create a new model instance:

    new_model = IrisClassifier()
  2. Load the saved weights:

    new_model.load_state_dict(torch.load("iris_model.pt"))
    new_model.eval()
  3. Write a function named evaluate_on_validation:

    def evaluate_on_validation(model, X_val, y_val):
        # disable gradients
        # run forward pass on validation set
        # pick predicted class IDs with torch.argmax
        # compute accuracy = correct / total
        # print the accuracy
  4. Call it:

    evaluate_on_validation(new_model, X_val, y_val)

πŸ“€ Deliverables

  • The val_ratio you used (e.g., 0.2 or 0.3).

  • The accuracy on the validation set printed by your function.

  • 2–3 sentences reflecting on:

    • how accuracy changes with different val_ratio,
    • whether training longer or changing hidden size helped.

πŸ’‘ Tips

  • Always call model.eval() before validating.
  • Use torch.no_grad() while running inference to save memory and speed up.
  • Ensure the model class definition is identical when loading weights.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment