tanaypratap/day2_hw1.md

## day2_hw1.md

      
    Raw
  

              day2_hw1.md
            
          
    📚 Homework — Explore, Observe, Explain

Goal: tweak small things, watch outcomes, and write short notes on what you observe. No extra code required beyond your lab notebook.

1) Reproducibility


What to change: Set torch.manual_seed(0) before model init and training.
What to observe: Does the loss trajectory stay the same across reruns?
Deliverable: One sentence on why fixed seeds matter.


2) Hidden Size Sweep


What to change: hidden_dim in IrisClassifier → try 8, 16, 32, 64.
What to observe: Final loss after 200 epochs. Convergence speed.
Deliverable: A tiny table (hidden_dim vs final loss) and one-sentence takeaway.


3) Learning Rate Sweep


What to change: learning_rate in optimizer → try 0.001, 0.01, 0.05, 0.1.
What to observe: Does loss decrease smoothly or oscillate/diverge?
Deliverable: Name the best LR for your runs and why.


4) Training Duration


What to change: num_epochs → try 50, 100, 200, 400.
What to observe: Diminishing returns. When does the curve “flatten”?
Deliverable: One sentence on when you’d stop training and why.


5) Activation Function


What to change: Replace nn.ReLU() with nn.GELU() or nn.Tanh().
What to observe: Any change in final loss or convergence speed?
Deliverable: One-sentence comparison.


6) Weight Decay (L2)


What to change: In create_loss_and_optimizer, set weight_decay to 0.0, 1e-4, 1e-3.
What to observe: Subtle change in loss; weights slightly smaller (inspect weight.norm() if curious).
Deliverable: One sentence on whether L2 helped or not.


7) Batch vs Full-Batch (Optional)


What to do: Wrap data in a DataLoader with batch_size=16 and train for the same epochs.
What to observe: Noise in loss per step, but similar final loss.
Deliverable: One sentence on pros/cons of mini-batching.


8) Weight Initialization Peek (Optional)


What to do: Print a slice of fully_connected_layer1.weight before and after training.
What to observe: Small random numbers → shifted after learning.
Deliverable: One sentence explaining what changed and why.


🎁 Bonus: Add One More Hidden Layer

Goal: Extend the model to 4 → 16 → 8 → 3.
Hints (no full code):

Add another nn.Linear after the first ReLU:

self.fully_connected_layer2 = nn.Linear(16, 8)
self.output_layer = nn.Linear(8, 3)


Update forward:

x = self.fully_connected_layer1(input_features) → ReLU
x = self.fully_connected_layer2(x) → ReLU
logits = self.output_layer(x)


Keep everything else the same (loss, optimizer, loop).
What to observe: Compare final loss vs the single-hidden-layer model. Any faster convergence? Overfitting signs?

Deliverable: 2–3 sentences reflecting on whether the extra layer helped and under what setting (hidden sizes, learning rate).

Submission


Do it either in colab and send screenshot or in a lab notebook (remember labs in school) and send a pic.
Short notes under each item.
Keep outputs concise. Focus on observations and one-line explanations.


## day2_hw2.md

      
    Raw
  

              day2_hw2.md
            
          
    📘 Day2.HW2 — Train/Validation Split + Save & Load + Validate

⚠️ Important:

Do this homework in a fresh new Colab notebook — do not use your Lesson-1 classwork notebook.

This helps you start clean and keeps class code separate from your experiments.

🎯 Objectives


Add a train/validation split with a configurable ratio.
Train on the training split only.
Save the trained model to disk.
Load the model from disk in a new instance.
Validate the loaded model on the held-out validation set and report accuracy.


📝 Part 1 — Train/Validation Split (Configurable)


Create a helper function:
def split_train_val(features_tensor, labels_tensor, val_ratio=0.2, seed=0):
    ...


Use sklearn.model_selection.train_test_split directly on the PyTorch tensors
(no need to convert to NumPy).


Pass val_ratio as the test_size argument so you can call:
X_train, X_val, y_train, y_val = split_train_val(features_tensor, labels_tensor, val_ratio=0.3)


Keep a fixed seed for reproducibility.


📝 Part 2 — Train on Train-Split & Save the Model


Re-instantiate your IrisClassifier and train only on (X_train, y_train).


After training, save the trained weights:
torch.save(model.state_dict(), "iris_model.pt")


📝 Part 3 — Load & Validate


Create a new model instance:
new_model = IrisClassifier()


Load the saved weights:
new_model.load_state_dict(torch.load("iris_model.pt"))
new_model.eval()


Write a function named evaluate_on_validation:
def evaluate_on_validation(model, X_val, y_val):
    # disable gradients
    # run forward pass on validation set
    # pick predicted class IDs with torch.argmax
    # compute accuracy = correct / total
    # print the accuracy


Call it:
evaluate_on_validation(new_model, X_val, y_val)


📤 Deliverables


The val_ratio you used (e.g., 0.2 or 0.3).


The accuracy on the validation set printed by your function.


2–3 sentences reflecting on:

how accuracy changes with different val_ratio,
whether training longer or changing hidden size helped.


💡 Tips


Always call model.eval() before validating.
Use torch.no_grad() while running inference to save memory and speed up.
Ensure the model class definition is identical when loading weights.
No results found