Goal: tweak small things, watch outcomes, and write short notes on what you observe. No extra code required beyond your lab notebook.
- What to change: Set
torch.manual_seed(0)before model init and training. - What to observe: Does the loss trajectory stay the same across reruns?
- Deliverable: One sentence on why fixed seeds matter.
2) Hidden Size Sweep
- What to change:
hidden_diminIrisClassifierβ try8,16,32,64. - What to observe: Final loss after 200 epochs. Convergence speed.
- Deliverable: A tiny table (hidden_dim vs final loss) and one-sentence takeaway.
- What to change:
learning_ratein optimizer β try0.001,0.01,0.05,0.1. - What to observe: Does loss decrease smoothly or oscillate/diverge?
- Deliverable: Name the best LR for your runs and why.
- What to change:
num_epochsβ try50,100,200,400. - What to observe: Diminishing returns. When does the curve βflattenβ?
- Deliverable: One sentence on when youβd stop training and why.
- What to change: Replace
nn.ReLU()withnn.GELU()ornn.Tanh(). - What to observe: Any change in final loss or convergence speed?
- Deliverable: One-sentence comparison.
- What to change: In
create_loss_and_optimizer, setweight_decayto0.0,1e-4,1e-3. - What to observe: Subtle change in loss; weights slightly smaller (inspect
weight.norm()if curious). - Deliverable: One sentence on whether L2 helped or not.
- What to do: Wrap data in a
DataLoaderwithbatch_size=16and train for the same epochs. - What to observe: Noise in loss per step, but similar final loss.
- Deliverable: One sentence on pros/cons of mini-batching.
- What to do: Print a slice of
fully_connected_layer1.weightbefore and after training. - What to observe: Small random numbers β shifted after learning.
- Deliverable: One sentence explaining what changed and why.
π Bonus: Add One More Hidden Layer
Goal: Extend the model to 4 β 16 β 8 β 3.
Hints (no full code):
- Add another
nn.Linearafter the first ReLU:self.fully_connected_layer2 = nn.Linear(16, 8)self.output_layer = nn.Linear(8, 3)
- Update
forward:x = self.fully_connected_layer1(input_features)β ReLUx = self.fully_connected_layer2(x)β ReLUlogits = self.output_layer(x)
- Keep everything else the same (loss, optimizer, loop).
- What to observe: Compare final loss vs the single-hidden-layer model. Any faster convergence? Overfitting signs?
Deliverable: 2β3 sentences reflecting on whether the extra layer helped and under what setting (hidden sizes, learning rate).
- Do it either in colab and send screenshot or in a lab notebook (remember labs in school) and send a pic.
- Short notes under each item.
- Keep outputs concise. Focus on observations and one-line explanations.