ardzz/decision_tree_learning_solution_sonnet.ipynb

## decision_tree_learning_solution_sonnet.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "cells": [
    {
      "id": "d922d73e",
      "cell_type": "markdown",
      "source": "# Decision Tree Learning\n\n*Generated by NeuronLab AI*",
      "metadata": {}
    },
    {
      "id": "c06922c4",
      "cell_type": "markdown",
      "source": "## Step 1: Entropy and Information Theory Fundamentals",
      "metadata": {}
    },
    {
      "id": "519c1c14",
      "cell_type": "markdown",
      "source": "## Step 1: Entropy and Information Theory Fundamentals\n\n### 1. Concept\n\n**Shannon Entropy** is the foundational measure of uncertainty or impurity in a dataset. It quantifies how mixed or unpredictable the class labels are in a collection of examples. In decision tree learning, entropy determines how \"pure\" a dataset is\u2014a pure dataset (all examples belong to one class) has zero entropy, while maximum entropy occurs when classes are evenly distributed.[1][2][3]\n\n### 2. Formula Application\n\nFor a classification dataset $ D $ with target classes $ \\{c_1, c_2, \\ldots, c_K\\} $, the entropy is calculated using:\n\n$$H(D) = -\\sum_{k=1}^{K} p_k \\log_2(p_k)$$\n\nWhere each variable represents:\n- $ H(D) $ = entropy of dataset $ D $ (measured in bits)\n- $ K $ = number of distinct classes in the target variable\n- $ p_k $ = proportion of examples belonging to class $ c_k $, calculated as $ p_k = \\frac{n_k}{n} $\n- $ n_k $ = count of examples in class $ c_k $\n- $ n $ = total number of examples in the dataset\n- By convention: $ 0 \\log_2(0) = 0 $[2]\n\n**For binary classification** (two classes), the formula simplifies to:\n\n$$H(D) = -p \\log_2(p) - (1-p) \\log_2(1-p)$$\n\nwhere $ p $ is the proportion of one class.\n\n### 3. Step-by-Step Computation\n\n#### Test Case Example: Pure Dataset\n\n**Input:** `['Yes', 'Yes', 'Yes', 'Yes']`\n\nThis represents a scenario where the decision is always \"Play Tennis = Yes\" regardless of weather conditions.[4]\n\n**Step 3.1 - Count class frequencies:**\n- Total examples: $ n = 4 $\n- Examples with \"Yes\": $ n_{\\text{Yes}} = 4 $\n- Examples with \"No\": $ n_{\\text{No}} = 0 $\n\n**Step 3.2 - Calculate class proportions:**\n\n$$p_{\\text{Yes}} = \\frac{4}{4} = 1.0$$\n$$p_{\\text{No}} = \\frac{0}{4} = 0.0$$\n\n**Step 3.3 - Apply entropy formula:**\n\nSince $ p_{\\text{No}} = 0 $, by convention $ 0 \\log_2(0) = 0 $:\n\n$$H(D) = -(1.0 \\times \\log_2(1.0) + 0 \\times \\log_2(0))$$\n$$H(D) = -(1.0 \\times 0 + 0) = 0.0$$\n\n**Expected Output:** `0.0` \u2713\n\n***\n\n#### Real-World Example: Play Tennis Dataset\n\nUsing the complete 14-day Play Tennis dataset from the web references:[5][4]\n\n| Day | Outlook  | Temperature | Humidity | Wind   | **PlayTennis** |\n|-----|----------|-------------|----------|--------|----------------|\n| D1  | Sunny    | Hot         | High     | Weak   | **No**         |\n| D2  | Sunny    | Hot         | High     | Strong | **No**         |\n| D3  | Overcast | Hot         | High     | Weak   | **Yes**        |\n| D4  | Rain     | Mild        | High     | Weak   | **Yes**        |\n| D5  | Rain     | Cool        | Normal   | Weak   | **Yes**        |\n| D6  | Rain     | Cool        | Normal   | Strong | **No**         |\n| D7  | Overcast | Cool        | Normal   | Strong | **Yes**        |\n| D8  | Sunny    | Mild        | High     | Weak   | **No**         |\n| D9  | Sunny    | Cool        | Normal   | Weak   | **Yes**        |\n| D10 | Rain     | Mild        | Normal   | Weak   | **Yes**        |\n| D11 | Sunny    | Mild        | Normal   | Strong | **Yes**        |\n| D12 | Overcast | Mild        | High     | Strong | **Yes**        |\n| D13 | Overcast | Hot         | Normal   | Weak   | **Yes**        |\n| D14 | Rain     | Mild        | High     | Strong | **No**         |\n\n**Step 3.1 - Parse the input data:**\n\nClass distribution in target variable (PlayTennis):\n- Total instances: $ n = 14 $\n- \"Yes\" instances: $ n_{\\text{Yes}} = 9 $ (D3, D4, D5, D7, D9, D10, D11, D12, D13)\n- \"No\" instances: $ n_{\\text{No}} = 5 $ (D1, D2, D6, D8, D14)\n\n**Step 3.2 - Calculate proportions:**\n\n$$p_{\\text{Yes}} = \\frac{9}{14} = 0.6429$$\n$$p_{\\text{No}} = \\frac{5}{14} = 0.3571$$\n\n**Step 3.3 - Compute logarithms:**\n\n$$\\log_2(p_{\\text{Yes}}) = \\log_2(0.6429) = -0.6374$$\n$$\\log_2(p_{\\text{No}}) = \\log_2(0.3571) = -1.4854$$\n\n**Step 3.4 - Calculate individual entropy terms:**\n\n$$p_{\\text{Yes}} \\times \\log_2(p_{\\text{Yes}}) = 0.6429 \\times (-0.6374) = -0.4098$$\n$$p_{\\text{No}} \\times \\log_2(p_{\\text{No}}) = 0.3571 \\times (-1.4854) = -0.5305$$\n\n**Step 3.5 - Apply the entropy formula:**\n\n$$H(S) = -[(-0.4098) + (-0.5305)]$$\n$$H(S) = -[-0.9403]$$\n$$H(S) = 0.9403 \\text{ bits}$$\n\n### 4. Key Result\n\n**Result for Step 1:**\n\nThe entropy of the complete Play Tennis dataset is **H(S) = 0.9403 bits**.[1]\n\nThis indicates moderate uncertainty in the dataset\u2014neither completely random (entropy = 1.0 for balanced binary) nor completely pure (entropy = 0.0). The value of 0.9403 suggests the dataset has a slight bias toward \"Yes\" (9 out of 14), but still contains substantial uncertainty that the decision tree algorithm will attempt to reduce.[2]\n\n### 5. Connection to Next Step\n\nThis initial entropy value $ H(S) = 0.9403 $ serves as the **baseline impurity measure** for Step 2. In the next step, we will calculate the **Information Gain** for each attribute (Outlook, Temperature, Humidity, Wind) by:[6][4]\n\n1. Splitting the dataset according to each attribute's values\n2. Computing the weighted average entropy of the resulting subsets\n3. Measuring how much each attribute reduces entropy from the baseline 0.9403\n\nThe attribute that achieves the **maximum entropy reduction** (highest Information Gain) will become the root node of the decision tree.[1][2]",
      "metadata": {}
    },
    {
      "id": "6016045b",
      "cell_type": "markdown",
      "source": "### Vectorized Implementation (NumPy)",
      "metadata": {}
    },
    {
      "id": "e328eb49",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "## Python Code (Vectorized Implementation)\n\n```python\nimport numpy as np\n\n# ============================================================\n# Shannon Entropy Calculator (Vectorized Implementation)\n# ============================================================\n\n# Test Case: Pure Dataset (All examples belong to one class)\ndataset = np.array(['Yes', 'Yes', 'Yes', 'Yes'])\n\nprint(\"=\"*60)\nprint(\"SHANNON ENTROPY CALCULATION - PURE DATASET\")\nprint(\"=\"*60)\nprint(f\"\\nInput Dataset: {dataset}\")\nprint(f\"Total examples (n): {len(dataset)}\")\n\n# Step 1: Find unique classes and their counts\nunique_classes, counts = np.unique(dataset, return_counts=True)\nprint(f\"\\n--- Step 1: Count Class Frequencies ---\")\nprint(f\"Unique classes: {unique_classes}\")\nprint(f\"Class counts: {counts}\")\n\n# Step 2: Calculate proportions (probabilities)\nn = len(dataset)\nproportions = counts / n\nprint(f\"\\n--- Step 2: Calculate Proportions (p_k) ---\")\nfor i, cls in enumerate(unique_classes):\n    print(f\"p('{cls}') = {counts[i]}/{n} = {proportions[i]:.4f}\")\n\n# Step 3: Calculate entropy components (-p_k * log2(p_k))\n# Handle zero probability case: 0 * log(0) = 0\nentropy_components = np.zeros(len(proportions))\nfor i in range(len(proportions)):\n    if proportions[i] > 0:\n        entropy_components[i] = -proportions[i] * np.log2(proportions[i])\n    else:\n        entropy_components[i] = 0\n\nprint(f\"\\n--- Step 3: Calculate Entropy Components ---\")\nfor i, cls in enumerate(unique_classes):\n    if proportions[i] > 0:\n        print(f\"-p('{cls}') * log2(p('{cls}')) = -{proportions[i]:.4f} * {np.log2(proportions[i]):.4f} = {entropy_components[i]:.4f}\")\n    else:\n        print(f\"-p('{cls}') * log2(p('{cls}')) = 0 (by convention)\")\n\n# Step 4: Sum all components to get total entropy\nentropy = np.sum(entropy_components)\nprint(f\"\\n--- Step 4: Calculate Total Entropy ---\")\nprint(f\"H(D) = sum of components = {entropy:.4f} bits\")\nprint(f\"\\nInterpretation: Entropy = {entropy:.4f} bits means the dataset is PURE (no uncertainty)\")\n\nprint(\"\\n\" + \"=\"*60)\nprint(\"TEST CASE 2: IMPURE DATASET (Mixed Classes)\")\nprint(\"=\"*60)\n\n# Test Case 2: Impure Dataset (Mixed classes)\ndataset2 = np.array(['Yes', 'Yes', 'Yes', 'No', 'No'])\nprint(f\"\\nInput Dataset: {dataset2}\")\nprint(f\"Total examples (n): {len(dataset2)}\")\n\n# Repeat calculation for impure dataset\nunique_classes2, counts2 = np.unique(dataset2, return_counts=True)\nprint(f\"\\n--- Step 1: Count Class Frequencies ---\")\nprint(f\"Unique classes: {unique_classes2}\")\nprint(f\"Class counts: {counts2}\")\n\nn2 = len(dataset2)\nproportions2 = counts2 / n2\nprint(f\"\\n--- Step 2: Calculate Proportions (p_k) ---\")\nfor i, cls in enumerate(unique_classes2):\n    print(f\"p('{cls}') = {counts2[i]}/{n2} = {proportions2[i]:.4f}\")\n\nentropy_components2 = np.zeros(len(proportions2))\nfor i in range(len(proportions2)):\n    if proportions2[i] > 0:\n        entropy_components2[i] = -proportions2[i] * np.log2(proportions2[i])\n\nprint(f\"\\n--- Step 3: Calculate Entropy Components ---\")\nfor i, cls in enumerate(unique_classes2):\n    print(f\"-p('{cls}') * log2(p('{cls}')) = -{proportions2[i]:.4f} * {np.log2(proportions2[i]):.4f} = {entropy_components2[i]:.4f}\")\n\nentropy2 = np.sum(entropy_components2)\nprint(f\"\\n--- Step 4: Calculate Total Entropy ---\")\nprint(f\"H(D) = {' + '.join([f'{x:.4f}' for x in entropy_components2])} = {entropy2:.4f} bits\")\nprint(f\"\\nInterpretation: Entropy = {entropy2:.4f} bits means the dataset is IMPURE (has uncertainty)\")\n\nprint(\"\\n\" + \"=\"*60)\nprint(\"TEST CASE 3: MAXIMUM ENTROPY (Perfectly Balanced)\")\nprint(\"=\"*60)\n\n# Test Case 3: Maximum Entropy (Equal distribution)\ndataset3 = np.array(['Yes', 'Yes', 'No', 'No'])\nprint(f\"\\nInput Dataset: {dataset3}\")\nprint(f\"Total examples (n): {len(dataset3)}\")\n\nunique_classes3, counts3 = np.unique(dataset3, return_counts=True)\nprint(f\"\\n--- Step 1: Count Class Frequencies ---\")\nprint(f\"Unique classes: {unique_classes3}\")\nprint(f\"Class counts: {counts3}\")\n\nn3 = len(dataset3)\nproportions3 = counts3 / n3\nprint(f\"\\n--- Step 2: Calculate Proportions (p_k) ---\")\nfor i, cls in enumerate(unique_classes3):\n    print(f\"p('{cls}') = {counts3[i]}/{n3} = {proportions3[i]:.4f}\")\n\nentropy_components3 = np.zeros(len(proportions3))\nfor i in range(len(proportions3)):\n    if proportions3[i] > 0:\n        entropy_components3[i] = -proportions3[i] * np.log2(proportions3[i])\n\nprint(f\"\\n--- Step 3: Calculate Entropy Components ---\")\nfor i, cls in enumerate(unique_classes3):\n    print(f\"-p('{cls}') * log2(p('{cls}')) = -{proportions3[i]:.4f} * {np.log2(proportions3[i]):.4f} = {entropy_components3[i]:.4f}\")\n\nentropy3 = np.sum(entropy_components3)\nprint(f\"\\n--- Step 4: Calculate Total Entropy ---\")\nprint(f\"H(D) = {entropy3:.4f} bits\")\nprint(f\"\\nInterpretation: Entropy = {entropy3:.4f} bits means MAXIMUM UNCERTAINTY (50/50 split)\")\nprint(\"=\"*60)\n```\n\n## Mathematical Explanation\n\n### Shannon Entropy Formula\n\nThe Shannon Entropy for a dataset \\( D \\) with \\( K \\) classes is defined as :\n\n\\[H(D) = -\\sum_{k=1}^{K} p_k \\log_2(p_k)\\]\n\nWhere:\n- \\( p_k = \\frac{n_k}{n} \\) is the proportion of examples in class \\( k \\)\n- \\( n_k \\) is the count of examples in class \\( k \\)\n- \\( n \\) is the total number of examples\n- By convention: \\( 0 \\log_2(0) = 0 \\)\n\n### Real-World Example: Weather Decision Dataset\n\nConsider a tennis playing decision system that predicts whether to play based on weather conditions .\n\n| Dataset Type | Data | Classes | Distribution |\n|--------------|------|---------|--------------|\n| Pure | ['Yes', 'Yes', 'Yes', 'Yes'] | 1 class | 100% Yes |\n| Impure | ['Yes', 'Yes', 'Yes', 'No', 'No'] | 2 classes | 60% Yes, 40% No |\n| Balanced | ['Yes', 'Yes', 'No', 'No'] | 2 classes | 50% Yes, 50% No |\n\n### Step-by-Step Computation\n\n#### Example 1: Pure Dataset (Zero Entropy)\n\n**Given:** Dataset = ['Yes', 'Yes', 'Yes', 'Yes'] \n\n**Step 1 - Count frequencies:**\n- \\( n_{Yes} = 4 \\), \\( n = 4 \\)\n\n**Step 2 - Calculate proportions:**\n- \\( p_{Yes} = \\frac{4}{4} = 1.0000 \\)\n\n**Step 3 - Calculate entropy components:**\n- \\( -p_{Yes} \\times \\log_2(p_{Yes}) = -1.0000 \\times \\log_2(1.0000) = -1.0000 \\times 0 = 0.0000 \\) bits\n\n**Step 4 - Sum components:**\n- \\( H(D) = 0.0000 \\) bits \n\n**Interpretation:** Zero entropy indicates perfect purity\u2014no uncertainty in predictions .\n\n#### Example 2: Impure Dataset (Mixed Classes)\n\n**Given:** Dataset = ['Yes', 'Yes', 'Yes', 'No', 'No'] \n\n**Step 1 - Count frequencies:**\n- \\( n_{No} = 2 \\), \\( n_{Yes} = 3 \\), \\( n = 5 \\)\n\n**Step 2 - Calculate proportions:**\n- \\( p_{No} = \\frac{2}{5} = 0.4000 \\)\n- \\( p_{Yes} = \\frac{3}{5} = 0.6000 \\)\n\n**Step 3 - Calculate entropy components:**\n- \\( -p_{No} \\times \\log_2(p_{No}) = -0.4000 \\times (-1.3219) = 0.5288 \\) bits\n- \\( -p_{Yes} \\times \\log_2(p_{Yes}) = -0.6000 \\times (-0.7370) = 0.4422 \\) bits\n\n**Step 4 - Sum components:**\n- \\( H(D) = 0.5288 + 0.4422 = 0.9710 \\) bits \n\n**Interpretation:** Higher entropy indicates greater uncertainty due to mixed class labels .\n\n#### Example 3: Maximum Entropy (Perfect Balance)\n\n**Given:** Dataset = ['Yes', 'Yes', 'No', 'No'] \n\n**Step 1-2:** \\( p_{No} = p_{Yes} = 0.5000 \\)\n\n**Step 3:** Each component contributes \\( 0.5000 \\) bits\n\n**Step 4:** \\( H(D) = 1.0000 \\) bits \n\n**Interpretation:** Maximum entropy for binary classification occurs at 50/50 split, representing maximum uncertainty .\n\n### Entropy Range Summary\n\n| Scenario | Entropy Value | Meaning |\n|----------|---------------|---------|\n| Pure dataset (all one class) | 0.0 bits | No uncertainty, perfect prediction |\n| Balanced binary split (50/50) | 1.0 bits | Maximum uncertainty for 2 classes |\n| Imbalanced mix | 0.0 - 1.0 bits | Partial uncertainty |",
      "outputs": []
    },
    {
      "id": "c8305e4b",
      "cell_type": "markdown",
      "source": "### Explicit Loops Implementation (Pure Python)",
      "metadata": {}
    },
    {
      "id": "7f9fc431",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "import math\n\n# Shannon Entropy Calculator - Explicit Loops Implementation\n# Mathematical Concept: H(D) = -sum(p_k * log2(p_k)) for k=1 to K\n\ndef calculate_entropy_explicit(labels):\n    \"\"\"\n    Calculate Shannon Entropy using explicit loops (Pure Python, no NumPy)\n    \n    Parameters:\n    labels: list of class labels (e.g., ['Yes', 'No', 'Yes'])\n    \n    Returns:\n    entropy: float value representing the entropy in bits\n    \"\"\"\n    \n    # Step 1: Count total number of examples\n    n = len(labels)\n    print(f\"Total number of examples (n): {n}\")\n    \n    # Step 2: Count frequency of each class using explicit loop\n    class_counts = {}\n    for label in labels:\n        if label in class_counts:\n            class_counts[label] += 1\n        else:\n            class_counts[label] = 1\n    \n    print(f\"\\nClass frequencies:\")\n    for class_name, count in class_counts.items():\n        print(f\"  Class '{class_name}': {count} examples\")\n    \n    # Step 3: Calculate proportion (probability) for each class\n    class_proportions = {}\n    for class_name, count in class_counts.items():\n        proportion = count / n\n        class_proportions[class_name] = proportion\n    \n    print(f\"\\nClass proportions (p_k = n_k / n):\")\n    for class_name, proportion in class_proportions.items():\n        print(f\"  p('{class_name}') = {class_counts[class_name]}/{n} = {proportion:.4f}\")\n    \n    # Step 4: Calculate entropy using the formula H(D) = -sum(p_k * log2(p_k))\n    entropy = 0.0\n    print(f\"\\nEntropy calculation: H(D) = -sum(p_k * log2(p_k))\")\n    \n    for class_name, p_k in class_proportions.items():\n        if p_k > 0:  # By convention: 0 * log2(0) = 0\n            term = p_k * math.log2(p_k)\n            print(f\"  Class '{class_name}': p_k * log2(p_k) = {p_k:.4f} * {math.log2(p_k):.4f} = {term:.4f}\")\n            entropy -= term\n        else:\n            print(f\"  Class '{class_name}': p_k = 0, contribution = 0 (by convention)\")\n    \n    print(f\"\\nFinal Entropy H(D) = {entropy:.4f} bits\")\n    return entropy\n\n\n# ============================================================================\n# DEMONSTRATION WITH MULTIPLE TEST CASES\n# ============================================================================\n\nprint(\"=\"*80)\nprint(\"SHANNON ENTROPY DEMONSTRATION - EXPLICIT LOOPS IMPLEMENTATION\")\nprint(\"=\"*80)\n\n# Test Case 1: Pure Dataset (Minimum Entropy = 0)\nprint(\"\\n\" + \"=\"*80)\nprint(\"TEST CASE 1: Pure Dataset (All Same Class)\")\nprint(\"=\"*80)\nprint(\"Scenario: Weather conditions always lead to 'Play Tennis = Yes'\")\nprint(\"Input: ['Yes', 'Yes', 'Yes', 'Yes']\")\nprint(\"-\"*80)\ndataset1 = ['Yes', 'Yes', 'Yes', 'Yes']\nentropy1 = calculate_entropy_explicit(dataset1)\n\n# Test Case 2: Balanced Binary Dataset (Maximum Entropy for binary = 1.0)\nprint(\"\\n\\n\" + \"=\"*80)\nprint(\"TEST CASE 2: Perfectly Balanced Binary Dataset\")\nprint(\"=\"*80)\nprint(\"Scenario: Equal split between playing and not playing tennis\")\nprint(\"Input: ['Yes', 'Yes', 'No', 'No']\")\nprint(\"-\"*80)\ndataset2 = ['Yes', 'Yes', 'No', 'No']\nentropy2 = calculate_entropy_explicit(dataset2)\n\n# Test Case 3: Imbalanced Binary Dataset\nprint(\"\\n\\n\" + \"=\"*80)\nprint(\"TEST CASE 3: Imbalanced Binary Dataset\")\nprint(\"=\"*80)\nprint(\"Scenario: More 'Yes' decisions than 'No' decisions\")\nprint(\"Input: ['Yes', 'Yes', 'Yes', 'No']\")\nprint(\"-\"*80)\ndataset3 = ['Yes', 'Yes', 'Yes', 'No']\nentropy3 = calculate_entropy_explicit(dataset3)\n\n# Test Case 4: Multi-class Dataset\nprint(\"\\n\\n\" + \"=\"*80)\nprint(\"TEST CASE 4: Multi-class Dataset (K=3 classes)\")\nprint(\"=\"*80)\nprint(\"Scenario: Weather can be 'Sunny', 'Rainy', or 'Cloudy'\")\nprint(\"Input: ['Sunny', 'Rainy', 'Cloudy', 'Sunny', 'Rainy', 'Cloudy']\")\nprint(\"-\"*80)\ndataset4 = ['Sunny', 'Rainy', 'Cloudy', 'Sunny', 'Rainy', 'Cloudy']\nentropy4 = calculate_entropy_explicit(dataset4)\n\n# Summary Table\nprint(\"\\n\\n\" + \"=\"*80)\nprint(\"SUMMARY OF ENTROPY VALUES\")\nprint(\"=\"*80)\nprint(f\"{'Test Case':<40} {'Entropy (bits)':<15}\")\nprint(\"-\"*80)\nprint(f\"{'1. Pure Dataset':<40} {entropy1:<15.4f}\")\nprint(f\"{'2. Balanced Binary (Max for K=2)':<40} {entropy2:<15.4f}\")\nprint(f\"{'3. Imbalanced Binary':<40} {entropy3:<15.4f}\")\nprint(f\"{'4. Balanced Multi-class (K=3)':<40} {entropy4:<15.4f}\")\nprint(\"=\"*80)\n\nprint(\"\\n\")\nprint(\"Key Insights:\")\nprint(\"\u2022 Entropy = 0 bits: Pure dataset (no uncertainty)\")\nprint(\"\u2022 Entropy = 1 bit: Maximum uncertainty for binary classification\")\nprint(\"\u2022 Higher entropy: More mixed/unpredictable classes\")\nprint(\"\u2022 Lower entropy: More homogeneous/predictable classes\")\n```\n\n## Mathematical Explanation with Step-by-Step Computation\n\n### Shannon Entropy Formula\n\nFor a dataset $D$ with $K$ classes, entropy is:\n\n$$H(D) = -\\sum_{k=1}^{K} p_k \\log_2(p_k)$$\n\nWhere:\n- $p_k$ = proportion of examples in class $k$\n- $p_k = \\frac{n_k}{n}$ where $n_k$ is count of class $k$ and $n$ is total examples\n\n### Real-World Example: Tennis Playing Decision\n\n**Dataset:** `['Yes', 'Yes', 'Yes', 'No']` - decisions about playing tennis based on weather\n\n| Index | Decision |\n|-------|----------|\n| 1     | Yes      |\n| 2     | Yes      |\n| 3     | Yes      |\n| 4     | No       |\n\n### Step-by-Step Computation\n\n**Step 1: Count class frequencies**\n- Class 'Yes': $n_{Yes} = 3$\n- Class 'No': $n_{No} = 1$\n- Total: $n = 4$\n\n**Step 2: Calculate proportions**\n- $p_{Yes} = \\frac{3}{4} = 0.75$\n- $p_{No} = \\frac{1}{4} = 0.25$\n\n**Step 3: Calculate logarithms**\n- $\\log_2(0.75) = -0.415$\n- $\\log_2(0.25) = -2.000$\n\n**Step 4: Calculate each term**\n- Term for 'Yes': $0.75 \\times (-0.415) = -0.311$\n- Term for 'No': $0.25 \\times (-2.000) = -0.500$\n\n**Step 5: Apply entropy formula**\n$$H(D) = -(\u22120.311 + (\u22120.500)) = -(-0.811) = 0.811 \\text{ bits}$$\n\n### Interpretation Table\n\n| Dataset Type | Class Distribution | Entropy | Meaning |\n|-------------|-------------------|---------|---------|\n| Pure |  | 0.000 bits | No uncertainty |\n| Balanced |  | 1.000 bits | Maximum uncertainty |\n| Imbalanced |  | 0.811 bits | Moderate uncertainty |\n\nHigher entropy indicates more mixing of classes and greater unpredictability in the dataset .",
      "outputs": []
    },
    {
      "id": "6213d7ef",
      "cell_type": "markdown",
      "source": "## Step 2: Information Gain and Attribute Selection",
      "metadata": {}
    },
    {
      "id": "830db689",
      "cell_type": "markdown",
      "source": "## Step 2: Information Gain and Attribute Selection\n\n### 1. Concept\n\n**Information Gain** quantifies how much an attribute reduces uncertainty (entropy) in the dataset. It measures the expected reduction in entropy achieved by partitioning the dataset according to a specific attribute. The attribute with the highest information gain is selected as the splitting criterion at each node, as it provides the most information about the class labels.[1][2][3][4]\n\n### 2. Formula Application\n\nThe Information Gain formula is:\n\n$$IG(S, A) = H(S) - H(S|A)$$\n\nWhere:\n- $IG(S, A)$ = Information Gain of attribute $A$ on dataset $S$\n- $H(S)$ = Entropy of the entire dataset (before split)\n- $H(S|A)$ = Conditional entropy after splitting on attribute $A$\n\nThe conditional entropy is calculated as:\n\n$$H(S|A) = \\sum_{v \\in \\text{Values}(A)} \\frac{|S_v|}{|S|} \\cdot H(S_v)$$\n\nWhere:\n- $\\text{Values}(A)$ = set of all possible values for attribute $A$\n- $S_v$ = subset of examples where attribute $A$ has value $v$\n- $|S_v|$ = number of examples in subset $S_v$\n- $|S|$ = total number of examples\n- $H(S_v)$ = entropy of subset $S_v$\n\n### 3. Step-by-Step Computation: Play Tennis Dataset\n\n#### Dataset Overview\n\nThe Play Tennis dataset contains 14 examples with 4 attributes (Outlook, Temperature, Humidity, Wind) predicting whether to play tennis:[2][3]\n\n| Day | Outlook | Temperature | Humidity | Wind | PlayTennis |\n|-----|---------|-------------|----------|--------|------------|\n| D1 | Sunny | Hot | High | Weak | No |\n| D2 | Sunny | Hot | High | Strong | No |\n| D3 | Overcast | Hot | High | Weak | Yes |\n| D4 | Rain | Mild | High | Weak | Yes |\n| D5 | Rain | Cool | Normal | Weak | Yes |\n| D6 | Rain | Cool | Normal | Strong | No |\n| D7 | Overcast | Cool | Normal | Strong | Yes |\n| D8 | Sunny | Mild | High | Weak | No |\n| D9 | Sunny | Cool | Normal | Weak | Yes |\n| D10 | Rain | Mild | Normal | Weak | Yes |\n| D11 | Sunny | Mild | Normal | Strong | Yes |\n| D12 | Overcast | Mild | High | Strong | Yes |\n| D13 | Overcast | Hot | Normal | Weak | Yes |\n| D14 | Rain | Mild | High | Strong | No |\n\n**Class Distribution**: 9 Yes, 5 No[2]\n\n#### Overall Dataset Entropy\n\nBuilding on Step 1's entropy calculation, we compute the entropy for the entire dataset:\n\n$$H(S) = -\\frac{9}{14}\\log_2\\left(\\frac{9}{14}\\right) - \\frac{5}{14}\\log_2\\left(\\frac{5}{14}\\right)$$\n\n$$H(S) = -(0.6429)(-0.6374) - (0.3571)(-1.4854) = 0.4098 + 0.5305 = 0.9403 \\text{ bits}$$\n\n#### Information Gain for Each Attribute\n\n**Attribute 1: Outlook**\n\nSplitting by Outlook creates three subsets:[2]\n- **Outlook = Sunny**: 5 examples [Yes: 2, No: 3]\n  $$H(\\text{Sunny}) = -\\frac{2}{5}\\log_2\\left(\\frac{2}{5}\\right) - \\frac{3}{5}\\log_2\\left(\\frac{3}{5}\\right) = 0.9710 \\text{ bits}$$\n\n- **Outlook = Overcast**: 4 examples [Yes: 4, No: 0]\n  $$H(\\text{Overcast}) = 0 \\text{ bits (pure subset)}$$\n\n- **Outlook = Rain**: 5 examples [Yes: 3, No: 2]\n  $$H(\\text{Rain}) = -\\frac{3}{5}\\log_2\\left(\\frac{3}{5}\\right) - \\frac{2}{5}\\log_2\\left(\\frac{2}{5}\\right) = 0.9710 \\text{ bits}$$\n\nConditional entropy:\n$$H(S|\\text{Outlook}) = \\frac{5}{14}(0.9710) + \\frac{4}{14}(0) + \\frac{5}{14}(0.9710) = 0.6935 \\text{ bits}$$\n\nInformation Gain:\n$$IG(S, \\text{Outlook}) = 0.9403 - 0.6935 = \\mathbf{0.2467} \\text{ bits}$$\n\n**Attribute 2: Temperature**\n\nSplitting by Temperature:[2]\n- **Temperature = Hot**: 4 examples [Yes: 2, No: 2], $H = 1.0000$ bits\n- **Temperature = Mild**: 6 examples [Yes: 4, No: 2], $H = 0.9183$ bits\n- **Temperature = Cool**: 4 examples [Yes: 3, No: 1], $H = 0.8113$ bits\n\n$$H(S|\\text{Temperature}) = \\frac{4}{14}(1.0000) + \\frac{6}{14}(0.9183) + \\frac{4}{14}(0.8113) = 0.9111 \\text{ bits}$$\n\n$$IG(S, \\text{Temperature}) = 0.9403 - 0.9111 = \\mathbf{0.0292} \\text{ bits}$$\n\n**Attribute 3: Humidity**\n\nSplitting by Humidity:[2]\n- **Humidity = High**: 7 examples [Yes: 3, No: 4], $H = 0.9852$ bits\n- **Humidity = Normal**: 7 examples [Yes: 6, No: 1], $H = 0.5917$ bits\n\n$$H(S|\\text{Humidity}) = \\frac{7}{14}(0.9852) + \\frac{7}{14}(0.5917) = 0.7885 \\text{ bits}$$\n\n$$IG(S, \\text{Humidity}) = 0.9403 - 0.7885 = \\mathbf{0.1518} \\text{ bits}$$\n\n**Attribute 4: Wind**\n\nSplitting by Wind:[2]\n- **Wind = Weak**: 8 examples [Yes: 6, No: 2], $H = 0.8113$ bits\n- **Wind = Strong**: 6 examples [Yes: 3, No: 3], $H = 1.0000$ bits\n\n$$H(S|\\text{Wind}) = \\frac{8}{14}(0.8113) + \\frac{6}{14}(1.0000) = 0.8922 \\text{ bits}$$\n\n$$IG(S, \\text{Wind}) = 0.9403 - 0.8922 = \\mathbf{0.0481} \\text{ bits}$$\n\n### 4. Key Result for Step 2\n\n**Information Gain Summary:**\n\n| Attribute | Information Gain |\n|-----------|------------------|\n| **Outlook** | **0.2467 bits** |\n| Humidity | 0.1518 bits |\n| Wind | 0.0481 bits |\n| Temperature | 0.0292 bits |\n\n**Result for Step 2:** The **Outlook** attribute has the maximum information gain of **0.2467 bits** and should be selected as the root node for splitting. The Overcast value creates a pure subset (all 4 examples are \"Yes\"), demonstrating that Outlook provides the most effective partitioning of the dataset.[3][2]\n\n### 5. Connection to Next Step\n\nThis result feeds directly into Step 3 (Recursive Splitting). Since Outlook has the highest information gain, it becomes the root node of the decision tree. The dataset will be partitioned into three branches (Sunny, Overcast, Rain), and the algorithm will recursively apply the same information gain calculation to the Sunny and Rain subsets (Overcast is already pure and becomes a leaf node with label \"Yes\") to determine the next best attributes for further splitting.[4]",
      "metadata": {}
    },
    {
      "id": "f3b392c8",
      "cell_type": "markdown",
      "source": "### Vectorized Implementation (NumPy)",
      "metadata": {}
    },
    {
      "id": "9ec24f1a",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "import numpy as np\n\n# Play Tennis Dataset\n# Encoding: Outlook (0=Sunny, 1=Overcast, 2=Rainy)\n# Temperature (0=Hot, 1=Mild, 2=Cool)\n# Humidity (0=High, 1=Normal)\n# Wind (0=Weak, 1=Strong)\n# PlayTennis (0=No, 1=Yes)\n\noutlook = np.array([0, 0, 1, 2, 2, 2, 1, 0, 0, 2, 0, 1, 1, 2])\ntemperature = np.array([0, 0, 0, 1, 2, 2, 2, 1, 2, 1, 1, 1, 0, 1])\nhumidity = np.array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0])\nwind = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1])\nplay_tennis = np.array([0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])\n\nprint(\"=\"*70)\nprint(\"INFORMATION GAIN AND ATTRIBUTE SELECTION\")\nprint(\"=\"*70)\nprint(\"\\nDataset: Play Tennis (14 examples)\")\nprint(\"Attributes: Outlook, Temperature, Humidity, Wind\")\nprint(\"Target: PlayTennis (0=No, 1=Yes)\")\nprint(\"=\"*70)\n\n# Step 1: Calculate entropy of entire dataset H(S)\ndef calculate_entropy(labels):\n    # Count occurrences of each class\n    total = len(labels)\n    unique_labels = np.unique(labels)\n    \n    entropy = 0.0\n    for label in unique_labels:\n        # Calculate probability of this class\n        count = np.sum(labels == label)\n        probability = count / total\n        \n        # Add to entropy: -p * log2(p)\n        if probability > 0:\n            entropy -= probability * np.log2(probability)\n    \n    return entropy\n\n# Calculate H(S) for entire dataset\nentropy_s = calculate_entropy(play_tennis)\ncount_no = np.sum(play_tennis == 0)\ncount_yes = np.sum(play_tennis == 1)\nprob_no = count_no / len(play_tennis)\nprob_yes = count_yes / len(play_tennis)\n\nprint(\"\\n### STEP 1: Calculate Entropy of Entire Dataset H(S)\")\nprint(f\"Total examples: {len(play_tennis)}\")\nprint(f\"Class distribution: No={count_no}, Yes={count_yes}\")\nprint(f\"P(No) = {count_no}/{len(play_tennis)} = {prob_no:.4f}\")\nprint(f\"P(Yes) = {count_yes}/{len(play_tennis)} = {prob_yes:.4f}\")\nprint(f\"\\nH(S) = -P(No)*log2(P(No)) - P(Yes)*log2(P(Yes))\")\nprint(f\"H(S) = -{prob_no:.4f}*log2({prob_no:.4f}) - {prob_yes:.4f}*log2({prob_yes:.4f})\")\nprint(f\"H(S) = {entropy_s:.4f} bits\")\n\n# Step 2: Calculate Information Gain for each attribute\ndef calculate_conditional_entropy(attribute, labels):\n    # Get unique values of the attribute\n    unique_values = np.unique(attribute)\n    total = len(labels)\n    \n    conditional_entropy = 0.0\n    details = []\n    \n    for value in unique_values:\n        # Get subset where attribute == value\n        mask = attribute == value\n        subset_labels = labels[mask]\n        subset_size = len(subset_labels)\n        \n        # Calculate weight and entropy of this subset\n        weight = subset_size / total\n        subset_entropy = calculate_entropy(subset_labels)\n        \n        # Add weighted entropy\n        conditional_entropy += weight * subset_entropy\n        \n        # Store details for printing\n        count_no = np.sum(subset_labels == 0)\n        count_yes = np.sum(subset_labels == 1)\n        details.append((value, subset_size, count_no, count_yes, subset_entropy, weight))\n    \n    return conditional_entropy, details\n\ndef calculate_information_gain(attribute, labels, entropy_s):\n    # Calculate H(S|A)\n    conditional_entropy, details = calculate_conditional_entropy(attribute, labels)\n    \n    # Information Gain = H(S) - H(S|A)\n    information_gain = entropy_s - conditional_entropy\n    \n    return information_gain, conditional_entropy, details\n\nprint(\"\\n\" + \"=\"*70)\nprint(\"### STEP 2: Calculate Information Gain for Each Attribute\")\nprint(\"=\"*70)\n\n# Attribute 1: Outlook\nprint(\"\\n--- ATTRIBUTE: Outlook ---\")\nprint(\"Values: 0=Sunny, 1=Overcast, 2=Rainy\")\n\nig_outlook, h_s_outlook, details_outlook = calculate_information_gain(outlook, play_tennis, entropy_s)\n\nprint(\"\\nConditional Entropy H(S|Outlook):\")\nfor value, size, count_no, count_yes, ent, weight in details_outlook:\n    value_name = ['Sunny', 'Overcast', 'Rainy'][value]\n    print(f\"\\n  {value_name} (value={value}): {size} examples, No={count_no}, Yes={count_yes}\")\n    print(f\"    H(S_{value_name}) = {ent:.4f} bits\")\n    print(f\"    Weight = {size}/{len(play_tennis)} = {weight:.4f}\")\n    print(f\"    Contribution = {weight:.4f} * {ent:.4f} = {weight * ent:.4f}\")\n\nprint(f\"\\nH(S|Outlook) = {h_s_outlook:.4f} bits\")\nprint(f\"\\nIG(S, Outlook) = H(S) - H(S|Outlook)\")\nprint(f\"IG(S, Outlook) = {entropy_s:.4f} - {h_s_outlook:.4f} = {ig_outlook:.4f} bits\")\n\n# Attribute 2: Temperature\nprint(\"\\n\" + \"-\"*70)\nprint(\"\\n--- ATTRIBUTE: Temperature ---\")\nprint(\"Values: 0=Hot, 1=Mild, 2=Cool\")\n\nig_temp, h_s_temp, details_temp = calculate_information_gain(temperature, play_tennis, entropy_s)\n\nprint(\"\\nConditional Entropy H(S|Temperature):\")\nfor value, size, count_no, count_yes, ent, weight in details_temp:\n    value_name = ['Hot', 'Mild', 'Cool'][value]\n    print(f\"\\n  {value_name} (value={value}): {size} examples, No={count_no}, Yes={count_yes}\")\n    print(f\"    H(S_{value_name}) = {ent:.4f} bits\")\n    print(f\"    Weight = {size}/{len(play_tennis)} = {weight:.4f}\")\n    print(f\"    Contribution = {weight:.4f} * {ent:.4f} = {weight * ent:.4f}\")\n\nprint(f\"\\nH(S|Temperature) = {h_s_temp:.4f} bits\")\nprint(f\"\\nIG(S, Temperature) = H(S) - H(S|Temperature)\")\nprint(f\"IG(S, Temperature) = {entropy_s:.4f} - {h_s_temp:.4f} = {ig_temp:.4f} bits\")\n\n# Attribute 3: Humidity\nprint(\"\\n\" + \"-\"*70)\nprint(\"\\n--- ATTRIBUTE: Humidity ---\")\nprint(\"Values: 0=High, 1=Normal\")\n\nig_humidity, h_s_humidity, details_humidity = calculate_information_gain(humidity, play_tennis, entropy_s)\n\nprint(\"\\nConditional Entropy H(S|Humidity):\")\nfor value, size, count_no, count_yes, ent, weight in details_humidity:\n    value_name = ['High', 'Normal'][value]\n    print(f\"\\n  {value_name} (value={value}): {size} examples, No={count_no}, Yes={count_yes}\")\n    print(f\"    H(S_{value_name}) = {ent:.4f} bits\")\n    print(f\"    Weight = {size}/{len(play_tennis)} = {weight:.4f}\")\n    print(f\"    Contribution = {weight:.4f} * {ent:.4f} = {weight * ent:.4f}\")\n\nprint(f\"\\nH(S|Humidity) = {h_s_humidity:.4f} bits\")\nprint(f\"\\nIG(S, Humidity) = H(S) - H(S|Humidity)\")\nprint(f\"IG(S, Humidity) = {entropy_s:.4f} - {h_s_humidity:.4f} = {ig_humidity:.4f} bits\")\n\n# Attribute 4: Wind\nprint(\"\\n\" + \"-\"*70)\nprint(\"\\n--- ATTRIBUTE: Wind ---\")\nprint(\"Values: 0=Weak, 1=Strong\")\n\nig_wind, h_s_wind, details_wind = calculate_information_gain(wind, play_tennis, entropy_s)\n\nprint(\"\\nConditional Entropy H(S|Wind):\")\nfor value, size, count_no, count_yes, ent, weight in details_wind:\n    value_name = ['Weak', 'Strong'][value]\n    print(f\"\\n  {value_name} (value={value}): {size} examples, No={count_no}, Yes={count_yes}\")\n    print(f\"    H(S_{value_name}) = {ent:.4f} bits\")\n    print(f\"    Weight = {size}/{len(play_tennis)} = {weight:.4f}\")\n    print(f\"    Contribution = {weight:.4f} * {ent:.4f} = {weight * ent:.4f}\")\n\nprint(f\"\\nH(S|Wind) = {h_s_wind:.4f} bits\")\nprint(f\"\\nIG(S, Wind) = H(S) - H(S|Wind)\")\nprint(f\"IG(S, Wind) = {entropy_s:.4f} - {h_s_wind:.4f} = {ig_wind:.4f} bits\")\n\n# Step 3: Compare Information Gains and select best attribute\nprint(\"\\n\" + \"=\"*70)\nprint(\"### STEP 3: Compare Information Gains and Select Best Attribute\")\nprint(\"=\"*70)\n\nprint(\"\\nInformation Gain Summary:\")\nprint(f\"  IG(S, Outlook)     = {ig_outlook:.4f} bits\")\nprint(f\"  IG(S, Temperature) = {ig_temp:.4f} bits\")\nprint(f\"  IG(S, Humidity)    = {ig_humidity:.4f} bits\")\nprint(f\"  IG(S, Wind)        = {ig_wind:.4f} bits\")\n\n# Find attribute with maximum information gain\nig_values = np.array([ig_outlook, ig_temp, ig_humidity, ig_wind])\nattribute_names = ['Outlook', 'Temperature', 'Humidity', 'Wind']\nbest_idx = np.argmax(ig_values)\nbest_attribute = attribute_names[best_idx]\nbest_ig = ig_values[best_idx]\n\nprint(f\"\\n*** BEST ATTRIBUTE TO SPLIT: {best_attribute} ***\")\nprint(f\"*** Maximum Information Gain: {best_ig:.4f} bits ***\")\nprint(\"\\nThis attribute reduces uncertainty the most and should be\")\nprint(\"selected as the root node of the decision tree.\")\n\nprint(\"\\n\" + \"=\"*70)\n```\n\n## Mathematical Explanation\n\n### 1. Information Gain Formula\n\nInformation Gain measures the reduction in entropy (uncertainty) after splitting the dataset by an attribute:\n\n$$IG(S, A) = H(S) - H(S|A)$$\n\nWhere:\n- $H(S)$ is the entropy of the dataset before splitting\n- $H(S|A)$ is the conditional entropy after splitting on attribute $A$\n\n### 2. Conditional Entropy Formula\n\nThe conditional entropy is the weighted average of entropies of all subsets:\n\n$$H(S|A) = \\sum_{v \\in Values(A)} \\frac{|S_v|}{|S|} \\cdot H(S_v)$$\n\nWhere $S_v$ is the subset of examples where attribute $A$ has value $v$.\n\n### 3. Play Tennis Dataset\n\n| Day | Outlook | Temperature | Humidity | Wind | PlayTennis |\n|-----|---------|-------------|----------|------|------------|\n| 1 | Sunny | Hot | High | Weak | No |\n| 2 | Sunny | Hot | High | Strong | No |\n| 3 | Overcast | Hot | High | Weak | Yes |\n| 4 | Rainy | Mild | High | Weak | Yes |\n| 5 | Rainy | Cool | Normal | Weak | Yes |\n| 6 | Rainy | Cool | Normal | Strong | No |\n| 7 | Overcast | Cool | Normal | Strong | Yes |\n| 8 | Sunny | Mild | High | Weak | No |\n| 9 | Sunny | Cool | Normal | Weak | Yes |\n| 10 | Rainy | Mild | Normal | Weak | Yes |\n| 11 | Sunny | Mild | Normal | Strong | Yes |\n| 12 | Overcast | Mild | High | Strong | Yes |\n| 13 | Overcast | Hot | Normal | Weak | Yes |\n| 14 | Rainy | Mild | High | Strong | No |\n\n### 4. Step-by-Step Computation\n\n**Step 1: Calculate $H(S)$**\n- Total: 14 examples (5 No, 9 Yes)\n- $P(No) = 5/14 = 0.3571$, $P(Yes) = 9/14 = 0.6429$\n- $H(S) = -0.3571 \\cdot \\log_2(0.3571) - 0.6429 \\cdot \\log_2(0.6429) = 0.9403$ bits\n\n**Step 2: Calculate $IG$ for Outlook**\n\nOutlook splits into: Sunny (5), Overcast (4), Rainy (5)\n\n- Sunny: 3 No, 2 Yes \u2192 $H(Sunny) = 0.9710$ bits\n- Overcast: 0 No, 4 Yes \u2192 $H(Overcast) = 0.0000$ bits (pure!)\n- Rainy: 2 No, 3 Yes \u2192 $H(Rainy) = 0.9710$ bits\n\n$$H(S|Outlook) = \\frac{5}{14}(0.9710) + \\frac{4}{14}(0.0000) + \\frac{5}{14}(0.9710) = 0.6935$$\n\n$$IG(S, Outlook) = 0.9403 - 0.6935 = 0.2467 \\text{ bits}$$\n\n**Step 3: Compare All Attributes**\n\n- $IG(Outlook) = 0.2467$ bits \u2190 **Maximum**\n- $IG(Temperature) = 0.0292$ bits\n- $IG(Humidity) = 0.1518$ bits\n- $IG(Wind) = 0.0481$ bits\n\n**Conclusion:** **Outlook** has the highest Information Gain (0.2467 bits), so it is selected as the root node for the decision tree. This attribute provides the most information about whether to play tennis.",
      "outputs": []
    },
    {
      "id": "d29c59ed",
      "cell_type": "markdown",
      "source": "### Explicit Loops Implementation (Pure Python)",
      "metadata": {}
    },
    {
      "id": "2d110681",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "import math\n\n# Play Tennis Dataset\ndata = [\n    {'Outlook': 'Sunny', 'Temperature': 'Hot', 'Humidity': 'High', 'Wind': 'Weak', 'PlayTennis': 'No'},\n    {'Outlook': 'Sunny', 'Temperature': 'Hot', 'Humidity': 'High', 'Wind': 'Strong', 'PlayTennis': 'No'},\n    {'Outlook': 'Overcast', 'Temperature': 'Hot', 'Humidity': 'High', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Rain', 'Temperature': 'Mild', 'Humidity': 'High', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Rain', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Rain', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Strong', 'PlayTennis': 'No'},\n    {'Outlook': 'Overcast', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Strong', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Sunny', 'Temperature': 'Mild', 'Humidity': 'High', 'Wind': 'Weak', 'PlayTennis': 'No'},\n    {'Outlook': 'Sunny', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Rain', 'Temperature': 'Mild', 'Humidity': 'Normal', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Sunny', 'Temperature': 'Mild', 'Humidity': 'Normal', 'Wind': 'Strong', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Overcast', 'Temperature': 'Mild', 'Humidity': 'High', 'Wind': 'Strong', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Overcast', 'Temperature': 'Hot', 'Humidity': 'Normal', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Outlook': 'Rain', 'Temperature': 'Mild', 'Humidity': 'High', 'Wind': 'Strong', 'PlayTennis': 'No'}\n]\n\n# Attribute names (excluding target)\nattributes = ['Outlook', 'Temperature', 'Humidity', 'Wind']\ntarget = 'PlayTennis'\n\nprint(\"=\" * 80)\nprint(\"INFORMATION GAIN AND ATTRIBUTE SELECTION FOR DECISION TREES\")\nprint(\"=\" * 80)\nprint()\n\n# Function to calculate entropy of a dataset\n# Formula: H(S) = -sum(p_i * log2(p_i)) for all classes i\ndef calculate_entropy(dataset):\n    \"\"\"\n    Calculate Shannon entropy for a dataset\n    \n    Steps:\n    1. Count frequency of each class label\n    2. Calculate probability p_i = count_i / total\n    3. Apply formula: H = -sum(p_i * log2(p_i))\n    \"\"\"\n    if len(dataset) == 0:\n        return 0\n    \n    # Step 1: Count frequency of each class\n    class_counts = {}\n    for example in dataset:\n        label = example[target]\n        if label in class_counts:\n            class_counts[label] += 1\n        else:\n            class_counts[label] = 1\n    \n    # Step 2 & 3: Calculate entropy\n    total = len(dataset)\n    entropy = 0.0\n    \n    for label, count in class_counts.items():\n        probability = count / total\n        if probability > 0:  # Avoid log(0)\n            entropy -= probability * math.log2(probability)\n    \n    return entropy\n\n# Function to get all unique values for an attribute\ndef get_attribute_values(dataset, attribute):\n    \"\"\"Extract all unique values for a given attribute\"\"\"\n    values = set()\n    for example in dataset:\n        values.add(example[attribute])\n    return list(values)\n\n# Function to split dataset based on attribute value\ndef split_dataset(dataset, attribute, value):\n    \"\"\"\n    Filter dataset to keep only examples where attribute == value\n    Returns subset of examples matching the condition\n    \"\"\"\n    subset = []\n    for example in dataset:\n        if example[attribute] == value:\n            subset.append(example)\n    return subset\n\n# Function to calculate conditional entropy H(S|A)\n# Formula: H(S|A) = sum((|S_v| / |S|) * H(S_v)) for all values v of attribute A\ndef calculate_conditional_entropy(dataset, attribute):\n    \"\"\"\n    Calculate conditional entropy after splitting on an attribute\n    \n    Steps:\n    1. Get all unique values for the attribute\n    2. For each value, create a subset of matching examples\n    3. Calculate entropy of each subset\n    4. Weight by subset size and sum\n    \"\"\"\n    total = len(dataset)\n    conditional_entropy = 0.0\n    \n    # Get all possible values for this attribute\n    values = get_attribute_values(dataset, attribute)\n    \n    # For each value, calculate weighted entropy\n    for value in values:\n        # Split dataset for this value\n        subset = split_dataset(dataset, attribute, value)\n        subset_size = len(subset)\n        \n        # Calculate weight (proportion of examples with this value)\n        weight = subset_size / total\n        \n        # Calculate entropy of this subset\n        subset_entropy = calculate_entropy(subset)\n        \n        # Add weighted entropy to total\n        conditional_entropy += weight * subset_entropy\n    \n    return conditional_entropy\n\n# Function to calculate information gain\n# Formula: IG(S, A) = H(S) - H(S|A)\ndef calculate_information_gain(dataset, attribute):\n    \"\"\"\n    Calculate information gain for an attribute\n    \n    Steps:\n    1. Calculate entropy of entire dataset H(S)\n    2. Calculate conditional entropy H(S|A) after splitting\n    3. Information Gain = H(S) - H(S|A)\n    \"\"\"\n    # Entropy before split\n    entropy_before = calculate_entropy(dataset)\n    \n    # Conditional entropy after split\n    conditional_entropy = calculate_conditional_entropy(dataset, attribute)\n    \n    # Information gain\n    information_gain = entropy_before - conditional_entropy\n    \n    return information_gain\n\n# ============================================================================\n# STEP-BY-STEP COMPUTATION\n# ============================================================================\n\nprint(\"DATASET: Play Tennis (14 examples)\")\nprint(\"-\" * 80)\nprint()\n\n# Display dataset summary\nyes_count = 0\nno_count = 0\nfor example in data:\n    if example[target] == 'Yes':\n        yes_count += 1\n    else:\n        no_count += 1\n\nprint(f\"Total examples: {len(data)}\")\nprint(f\"  - PlayTennis = Yes: {yes_count}\")\nprint(f\"  - PlayTennis = No: {no_count}\")\nprint()\n\n# ============================================================================\n# STEP 1: Calculate Entropy of Entire Dataset H(S)\n# ============================================================================\n\nprint(\"=\" * 80)\nprint(\"STEP 1: Calculate Entropy of Entire Dataset H(S)\")\nprint(\"=\" * 80)\nprint()\n\nentropy_S = calculate_entropy(data)\n\nprint(\"Formula: H(S) = -sum(p_i * log2(p_i)) for all classes i\")\nprint()\nprint(\"Class distribution:\")\nprint(f\"  - p(Yes) = {yes_count}/{len(data)} = {yes_count/len(data):.4f}\")\nprint(f\"  - p(No) = {no_count}/{len(data)} = {no_count/len(data):.4f}\")\nprint()\nprint(\"Calculation:\")\np_yes = yes_count / len(data)\np_no = no_count / len(data)\nprint(f\"  H(S) = -[{p_yes:.4f} * log2({p_yes:.4f}) + {p_no:.4f} * log2({p_no:.4f})]\")\nprint(f\"  H(S) = -[{p_yes * math.log2(p_yes):.4f} + {p_no * math.log2(p_no):.4f}]\")\nprint(f\"  H(S) = {entropy_S:.4f}\")\nprint()\n\n# ============================================================================\n# STEP 2: Calculate Information Gain for Each Attribute\n# ============================================================================\n\nprint(\"=\" * 80)\nprint(\"STEP 2: Calculate Information Gain for Each Attribute\")\nprint(\"=\" * 80)\nprint()\n\n# Dictionary to store information gain for each attribute\ninformation_gains = {}\n\nfor attribute in attributes:\n    print(f\"\\n{'='*80}\")\n    print(f\"ATTRIBUTE: {attribute}\")\n    print(f\"{'='*80}\")\n    print()\n    \n    # Get unique values for this attribute\n    values = get_attribute_values(data, attribute)\n    print(f\"Possible values: {sorted(values)}\")\n    print()\n    \n    # Calculate conditional entropy H(S|A)\n    print(f\"Calculating H(S|{attribute})...\")\n    print()\n    \n    conditional_entropy = 0.0\n    \n    for value in sorted(values):\n        # Split dataset\n        subset = split_dataset(data, attribute, value)\n        subset_size = len(subset)\n        \n        # Count classes in subset\n        subset_yes = 0\n        subset_no = 0\n        for example in subset:\n            if example[target] == 'Yes':\n                subset_yes += 1\n            else:\n                subset_no += 1\n        \n        print(f\"  {attribute} = {value}:\")\n        print(f\"    Total examples: {subset_size}\")\n        print(f\"    Yes: {subset_yes}, No: {subset_no}\")\n        \n        # Calculate entropy of this subset\n        subset_entropy = calculate_entropy(subset)\n        weight = subset_size / len(data)\n        \n        if subset_size > 0:\n            if subset_yes > 0 and subset_no > 0:\n                p_yes_subset = subset_yes / subset_size\n                p_no_subset = subset_no / subset_size\n                print(f\"    H(S_{value}) = -[{p_yes_subset:.4f}*log2({p_yes_subset:.4f}) + {p_no_subset:.4f}*log2({p_no_subset:.4f})]\")\n                print(f\"    H(S_{value}) = {subset_entropy:.4f}\")\n            elif subset_yes > 0:\n                print(f\"    H(S_{value}) = 0 (all Yes, pure)\")\n            else:\n                print(f\"    H(S_{value}) = 0 (all No, pure)\")\n            \n            print(f\"    Weight: {subset_size}/{len(data)} = {weight:.4f}\")\n            print(f\"    Weighted entropy: {weight:.4f} * {subset_entropy:.4f} = {weight * subset_entropy:.4f}\")\n        \n        conditional_entropy += weight * subset_entropy\n        print()\n    \n    print(f\"H(S|{attribute}) = {conditional_entropy:.4f}\")\n    print()\n    \n    # Calculate information gain\n    ig = entropy_S - conditional_entropy\n    information_gains[attribute] = ig\n    \n    print(f\"Information Gain:\")\n    print(f\"  IG(S, {attribute}) = H(S) - H(S|{attribute})\")\n    print(f\"  IG(S, {attribute}) = {entropy_S:.4f} - {conditional_entropy:.4f}\")\n    print(f\"  IG(S, {attribute}) = {ig:.4f}\")\n    print()\n\n# ============================================================================\n# STEP 3: Select Best Attribute\n# ============================================================================\n\nprint(\"\\n\" + \"=\" * 80)\nprint(\"STEP 3: ATTRIBUTE SELECTION - SUMMARY\")\nprint(\"=\" * 80)\nprint()\n\nprint(\"Information Gain for each attribute:\")\nprint()\nfor attribute in attributes:\n    print(f\"  IG(S, {attribute:15s}) = {information_gains[attribute]:.4f}\")\n\nprint()\nprint(\"-\" * 80)\n\n# Find attribute with maximum information gain\nbest_attribute = None\nmax_ig = -1\nfor attribute, ig in information_gains.items():\n    if ig > max_ig:\n        max_ig = ig\n        best_attribute = attribute\n\nprint(f\"\\nBEST ATTRIBUTE TO SPLIT: {best_attribute}\")\nprint(f\"Maximum Information Gain: {max_ig:.4f}\")\nprint()\nprint(\"INTERPRETATION:\")\nprint(f\"The '{best_attribute}' attribute reduces uncertainty the most and should be\")\nprint(\"selected as the root node of the decision tree.\")\nprint()\n\n# ============================================================================\n# MATHEMATICAL EXPLANATION\n# ============================================================================\n\nprint(\"=\" * 80)\nprint(\"MATHEMATICAL EXPLANATION\")\nprint(\"=\" * 80)\nprint()\n\nprint(\"FORMULA 1: Entropy (Uncertainty Measure)\")\nprint(\"  H(S) = -sum(p_i * log2(p_i)) for all classes i\")\nprint(\"  where p_i is the proportion of class i in dataset S\")\nprint()\n\nprint(\"FORMULA 2: Conditional Entropy (Weighted Average After Split)\")\nprint(\"  H(S|A) = sum((|S_v| / |S|) * H(S_v)) for all values v of attribute A\")\nprint(\"  where:\")\nprint(\"    - S_v is subset of examples where attribute A has value v\")\nprint(\"    - |S_v| / |S| is the proportion of examples with value v\")\nprint(\"    - H(S_v) is the entropy of subset S_v\")\nprint()\n\nprint(\"FORMULA 3: Information Gain (Entropy Reduction)\")\nprint(\"  IG(S, A) = H(S) - H(S|A)\")\nprint(\"  Measures how much uncertainty is reduced by splitting on attribute A\")\nprint()\n\nprint(\"DECISION RULE:\")\nprint(\"  Select the attribute with HIGHEST information gain as the splitting\")\nprint(\"  criterion. This attribute provides the most information about the\")\nprint(\"  class labels and creates the most 'pure' partitions.\")\nprint()\n\nprint(\"=\" * 80)\nprint(\"COMPUTATION COMPLETE\")\nprint(\"=\" * 80)",
      "outputs": []
    },
    {
      "id": "15cd007b",
      "cell_type": "markdown",
      "source": "## Step 3: Dataset Partitioning and Subset Creation",
      "metadata": {}
    },
    {
      "id": "4f052250",
      "cell_type": "markdown",
      "source": "## Step 3: Dataset Partitioning and Subset Creation\n\n### 1. Mathematical Concept\n\n**Dataset partitioning** is the fundamental operation in decision tree construction that divides a dataset into disjoint subsets based on an attribute's values. After selecting the best attribute using information gain (from Step 2), we create child nodes by splitting the data into mutually exclusive groups.[1][2][3][4][5]\n\nA **partition** $\\mathcal{P}_A(D)$ of dataset $D$ by attribute $A$ creates a collection of non-empty, pairwise disjoint subsets whose union equals the original dataset.[6]\n\n### 2. Formula Application\n\n#### Partition Definition\n\nGiven dataset $D$ and attribute $A$ with domain $\\text{Values}(A) = \\{v_1, v_2, \\ldots, v_k\\}$, the partition is defined as:\n\n$$\n\\mathcal{P}_A(D) = \\{D_{v_1}, D_{v_2}, \\ldots, D_{v_k}\\}\n$$\n\nwhere each subset is:\n\n$$\nD_{v_i} = \\{(x, y) \\in D : x[A] = v_i\\}\n$$\n\n**Variables:**\n- $D$ = original dataset (14 instances in Play Tennis)\n- $A$ = selected attribute (\"Outlook\" from Step 2, with IG = 0.247)\n- $v_i$ = specific value of attribute $A$ (Sunny, Overcast, Rain)\n- $D_{v_i}$ = subset containing all examples where $A = v_i$\n\n#### Partition Properties\n\nThe partition must satisfy:\n\n1. **Union property:** $\\displaystyle D = \\bigcup_{i=1}^{k} D_{v_i}$\n\n2. **Disjoint property:** $D_{v_i} \\cap D_{v_j} = \\emptyset$ for $i \\neq j$\n\n3. **Size verification:** $\\displaystyle \\sum_{v \\in \\text{Values}(A)} |D_v| = |D|$\n\n### 3. Step-by-Step Computation Using Play Tennis Dataset\n\n#### Input Dataset\n\nFrom the web references, we have the complete Play Tennis dataset with 14 instances:[2][1]\n\n| Day | Outlook  | Temperature | Humidity | Wind   | PlayTennis |\n|-----|----------|-------------|----------|--------|------------|\n| D1  | Sunny    | Hot         | High     | Weak   | No         |\n| D2  | Sunny    | Hot         | High     | Weak   | No         |\n| D3  | Overcast | Hot         | High     | Weak   | Yes        |\n| D4  | Rain     | Mild        | High     | Weak   | Yes        |\n| D5  | Rain     | Cool        | Normal   | Weak   | Yes        |\n| D6  | Rain     | Cool        | Normal   | Strong | No         |\n| D7  | Overcast | Cool        | Normal   | Strong | Yes        |\n| D8  | Sunny    | Mild        | High     | Weak   | No         |\n| D9  | Sunny    | Cool        | Normal   | Weak   | Yes        |\n| D10 | Rain     | Mild        | Normal   | Weak   | Yes        |\n| D11 | Sunny    | Mild        | Normal   | Strong | Yes        |\n| D12 | Overcast | Mild        | High     | Strong | Yes        |\n| D13 | Overcast | Hot         | Normal   | Weak   | Yes        |\n| D14 | Rain     | Mild        | High     | Strong | No         |\n\n#### Step 3a: Extract Attribute Values\n\nFrom Step 2, we determined that **Outlook** has the highest information gain (0.247). Extract the unique values:[1][2]\n\n$$\n\\text{Values}(\\text{Outlook}) = \\{\\text{Sunny}, \\text{Overcast}, \\text{Rain}\\}\n$$\n\nTherefore, $k = 3$ partitions will be created.[1]\n\n#### Step 3b: Create Partitions\n\nApply the subset definition formula for each value:\n\n**Partition 1: $D_{\\text{Sunny}}$**\n\nFilter dataset where Outlook = \"Sunny\":\n\n$$\nD_{\\text{Sunny}} = \\{(x, y) \\in D : x[\\text{Outlook}] = \\text{Sunny}\\}\n$$\n\n| Day | Outlook | Temperature | Humidity | Wind   | PlayTennis |\n|-----|---------|-------------|----------|--------|------------|\n| D1  | Sunny   | Hot         | High     | Weak   | **No**     |\n| D2  | Sunny   | Hot         | High     | Weak   | **No**     |\n| D8  | Sunny   | Mild        | High     | Weak   | **No**     |\n| D9  | Sunny   | Cool        | Normal   | Weak   | **Yes**    |\n| D11 | Sunny   | Mild        | Normal   | Strong | **Yes**    |\n\n- Size: $|D_{\\text{Sunny}}| = 5$\n- Class distribution: 3 No, 2 Yes\n- Class labels: [No, No, No, Yes, Yes]\n\n**Partition 2: $D_{\\text{Overcast}}$**\n\nFilter dataset where Outlook = \"Overcast\":\n\n$$\nD_{\\text{Overcast}} = \\{(x, y) \\in D : x[\\text{Outlook}] = \\text{Overcast}\\}\n$$\n\n| Day | Outlook  | Temperature | Humidity | Wind   | PlayTennis |\n|-----|----------|-------------|----------|--------|------------|\n| D3  | Overcast | Hot         | High     | Weak   | **Yes**    |\n| D7  | Overcast | Cool        | Normal   | Strong | **Yes**    |\n| D12 | Overcast | Mild        | High     | Strong | **Yes**    |\n| D13 | Overcast | Hot         | Normal   | Weak   | **Yes**    |\n\n- Size: $|D_{\\text{Overcast}}| = 4$\n- Class distribution: 0 No, 4 Yes (Pure subset!)\n- Class labels: [Yes, Yes, Yes, Yes]\n\n**Partition 3: $D_{\\text{Rain}}$**\n\nFilter dataset where Outlook = \"Rain\":\n\n$$\nD_{\\text{Rain}} = \\{(x, y) \\in D : x[\\text{Outlook}] = \\text{Rain}\\}\n$$\n\n| Day | Outlook | Temperature | Humidity | Wind   | PlayTennis |\n|-----|---------|-------------|----------|--------|------------|\n| D4  | Rain    | Mild        | High     | Weak   | **Yes**    |\n| D5  | Rain    | Cool        | Normal   | Weak   | **Yes**    |\n| D6  | Rain    | Cool        | Normal   | Strong | **No**     |\n| D10 | Rain    | Mild        | Normal   | Weak   | **Yes**    |\n| D14 | Rain    | Mild        | High     | Strong | **No**     |\n\n- Size: $|D_{\\text{Rain}}| = 5$\n- Class distribution: 2 No, 3 Yes\n- Class labels: [Yes, Yes, No, Yes, No]\n\n#### Step 3c: Verify Partition Properties\n\n**Property 1 - Union equals original set:**\n\n$$\n\\sum_{v \\in \\text{Values}(\\text{Outlook})} |D_v| = |D_{\\text{Sunny}}| + |D_{\\text{Overcast}}| + |D_{\\text{Rain}}| = 5 + 4 + 5 = 14 = |D|\n$$\n\n\u2713 Verified: All 14 instances accounted for.[1]\n\n**Property 2 - Disjoint subsets:**\n\n$$\nD_{\\text{Sunny}} \\cap D_{\\text{Overcast}} = \\emptyset\n$$\n$$\nD_{\\text{Sunny}} \\cap D_{\\text{Rain}} = \\emptyset\n$$\n$$\nD_{\\text{Overcast}} \\cap D_{\\text{Rain}} = \\emptyset\n$$\n\n\u2713 Verified: No day appears in multiple partitions.[1]\n\n**Property 3 - Non-empty subsets:**\n\n$$\n|D_{\\text{Sunny}}| = 5 > 0, \\quad |D_{\\text{Overcast}}| = 4 > 0, \\quad |D_{\\text{Rain}}| = 5 > 0\n$$\n\n\u2713 Verified: All partitions contain at least one instance.[1]\n\n### 4. Key Result for Step 3\n\n**Result for Step 3:** The Play Tennis dataset $D$ (14 instances) has been successfully partitioned by the Outlook attribute into 3 disjoint subsets:[1]\n\n$$\n\\mathcal{P}_{\\text{Outlook}}(D) = \\{D_{\\text{Sunny}}, D_{\\text{Overcast}}, D_{\\text{Rain}}\\}\n$$\n\nwhere:\n- $D_{\\text{Sunny}}$: 5 instances (3 No, 2 Yes) - **impure, requires further splitting**\n- $D_{\\text{Overcast}}$: 4 instances (0 No, 4 Yes) - **pure, becomes leaf node with label \"Yes\"**\n- $D_{\\text{Rain}}$: 5 instances (2 No, 3 Yes) - **impure, requires further splitting**\n\n### 5. Connection to Next Step\n\nThe partitioned subsets feed directly into **Step 4: Recursive Tree Construction**. For each partition:[5][2]\n\n- **If pure** (entropy = 0): The subset becomes a **leaf node** with the majority class label. The $D_{\\text{Overcast}}$ partition is already pure, so it terminates as a leaf node labeled \"Yes\".[1]\n\n- **If impure** (entropy > 0): The subset becomes an **internal node** requiring further splitting. Both $D_{\\text{Sunny}}$ and $D_{\\text{Rain}}$ have mixed class labels, so Step 4 will recursively apply the same process (calculate information gain for remaining attributes, select best attribute, partition again) until all branches terminate in pure subsets or stopping criteria are met.[4][5]\n\nThe recursive algorithm will process:\n1. Subset $D_{\\text{Sunny}}$ \u2192 Calculate IG for {Temperature, Humidity, Wind} \u2192 Select best attribute \u2192 Partition further\n2. Subset $D_{\\text{Rain}}$ \u2192 Calculate IG for {Temperature, Humidity, Wind} \u2192 Select best attribute \u2192 Partition further\n\nThis partitioning operation transforms the tree construction problem from processing one dataset of size 14 into processing three smaller subproblems of sizes 5, 4, and 5.[6][1]",
      "metadata": {}
    },
    {
      "id": "5817e230",
      "cell_type": "markdown",
      "source": "### Vectorized Implementation (NumPy)",
      "metadata": {}
    },
    {
      "id": "d908ad1c",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "import numpy as np\n\n# ============================================================================\n# STEP 3: DATASET PARTITIONING AND SUBSET CREATION\n# ============================================================================\n\nprint(\"=\" * 80)\nprint(\"DATASET PARTITIONING AND SUBSET CREATION\")\nprint(\"=\" * 80)\n\n# ----------------------------------------------------------------------------\n# 1. CREATE PLAY TENNIS DATASET\n# ----------------------------------------------------------------------------\n\n# Encode attributes numerically for NumPy operations\n# Outlook: 0=Sunny, 1=Overcast, 2=Rain\n# Temperature: 0=Hot, 1=Mild, 2=Cool\n# Humidity: 0=High, 1=Normal\n# Wind: 0=Weak, 1=Strong\n# Play: 0=No, 1=Yes\n\ndataset = np.array([\n    [0, 0, 0, 0, 0],  # Day 1: Sunny, Hot, High, Weak, No\n    [0, 0, 0, 1, 0],  # Day 2: Sunny, Hot, High, Strong, No\n    [1, 0, 0, 0, 1],  # Day 3: Overcast, Hot, High, Weak, Yes\n    [2, 1, 0, 0, 1],  # Day 4: Rain, Mild, High, Weak, Yes\n    [2, 2, 1, 0, 1],  # Day 5: Rain, Cool, Normal, Weak, Yes\n    [2, 2, 1, 1, 0],  # Day 6: Rain, Cool, Normal, Strong, No\n    [1, 2, 1, 1, 1],  # Day 7: Overcast, Cool, Normal, Strong, Yes\n    [0, 1, 0, 0, 0],  # Day 8: Sunny, Mild, High, Weak, No\n    [0, 2, 1, 0, 1],  # Day 9: Sunny, Cool, Normal, Weak, Yes\n    [2, 1, 1, 0, 1],  # Day 10: Rain, Mild, Normal, Weak, Yes\n    [0, 1, 1, 1, 1],  # Day 11: Sunny, Mild, Normal, Strong, Yes\n    [1, 1, 0, 1, 1],  # Day 12: Overcast, Mild, High, Strong, Yes\n    [1, 0, 1, 0, 1],  # Day 13: Overcast, Hot, Normal, Weak, Yes\n    [2, 1, 0, 1, 0],  # Day 14: Rain, Mild, High, Strong, No\n])\n\n# Attribute names and value mappings\noutlook_names = ['Sunny', 'Overcast', 'Rain']\nplay_names = ['No', 'Yes']\n\nprint(\"\\n1. ORIGINAL DATASET D\")\nprint(\"-\" * 80)\nprint(f\"Total instances: |D| = {len(dataset)}\")\nprint(f\"Attributes: [Outlook, Temperature, Humidity, Wind, Play]\")\nprint(f\"\\nDataset shape: {dataset.shape}\")\nprint(f\"First 5 instances:\")\nfor i in range(5):\n    print(f\"  Day {i+1}: {dataset[i]}\")\n\n# ----------------------------------------------------------------------------\n# 2. SELECT ATTRIBUTE FOR PARTITIONING\n# ----------------------------------------------------------------------------\n\nprint(\"\\n\\n2. ATTRIBUTE SELECTION\")\nprint(\"-\" * 80)\nprint(\"From Step 2, 'Outlook' had the highest Information Gain = 0.247\")\nprint(\"We will partition dataset D by attribute A = 'Outlook'\")\n\n# Outlook is at column index 0\nattribute_index = 0\nattribute_name = \"Outlook\"\n\n# Get unique values of Outlook: {0, 1, 2} = {Sunny, Overcast, Rain}\nunique_values = np.unique(dataset[:, attribute_index])\nprint(f\"\\nValues(Outlook) = {unique_values} = {[outlook_names[v] for v in unique_values]}\")\nprint(f\"Number of partitions k = {len(unique_values)}\")\n\n# ----------------------------------------------------------------------------\n# 3. CREATE PARTITIONS\n# ----------------------------------------------------------------------------\n\nprint(\"\\n\\n3. PARTITION CREATION\")\nprint(\"-\" * 80)\nprint(\"Formula: D_v = {(x,y) \u2208 D : x[Outlook] = v}\")\nprint()\n\n# Dictionary to store partitions\npartitions = {}\n\n# Create partition for each unique value\nfor value in unique_values:\n    # Boolean mask: True where Outlook equals current value\n    mask = dataset[:, attribute_index] == value\n    \n    # Extract subset using mask\n    subset = dataset[mask]\n    \n    # Store partition\n    partitions[value] = subset\n    \n    # Display partition details\n    value_name = outlook_names[value]\n    print(f\"Partition D_{value_name} (Outlook = {value_name}):\")\n    print(f\"  Size: |D_{value_name}| = {len(subset)}\")\n    print(f\"  Indices where Outlook = {value}: {np.where(mask)[0] + 1}\")  # +1 for day numbers\n    \n    # Count Play outcomes in this partition\n    play_column = subset[:, -1]  # Last column is Play\n    num_yes = np.sum(play_column == 1)\n    num_no = np.sum(play_column == 0)\n    print(f\"  Play outcomes: Yes = {num_yes}, No = {num_no}\")\n    print(f\"  Instances: {subset.tolist()}\")\n    print()\n\n# ----------------------------------------------------------------------------\n# 4. DISPLAY SUBSETS IN TABULAR FORMAT\n# ----------------------------------------------------------------------------\n\nprint(\"\\n4. PARTITION CONTENTS (Decoded)\")\nprint(\"-\" * 80)\n\nfor value in unique_values:\n    value_name = outlook_names[value]\n    subset = partitions[value]\n    \n    print(f\"\\nSubset D_{value_name}:\")\n    print(f\"{'Day':<5} {'Outlook':<10} {'Temp':<8} {'Humidity':<10} {'Wind':<8} {'Play':<6}\")\n    print(\"-\" * 60)\n    \n    for i, row in enumerate(subset):\n        # Find original day number\n        original_idx = np.where((dataset == row).all(axis=1))[0][0] + 1\n        print(f\"{original_idx:<5} {value_name:<10} {row[1]:<8} {row[2]:<10} {row[3]:<8} {play_names[int(row[4])]:<6}\")\n\n# ----------------------------------------------------------------------------\n# 5. VERIFY PARTITION PROPERTIES\n# ----------------------------------------------------------------------------\n\nprint(\"\\n\\n5. PARTITION PROPERTY VERIFICATION\")\nprint(\"-\" * 80)\n\n# Property 1: Union property\nprint(\"\\nProperty 1: Union Property\")\nprint(\"Formula: D = \u222a D_v for all v \u2208 Values(Outlook)\")\n\n# Concatenate all partitions\nunion = np.vstack([partitions[v] for v in unique_values])\nprint(f\"  |D| = {len(dataset)}\")\nprint(f\"  |D_Sunny \u222a D_Overcast \u222a D_Rain| = {len(union)}\")\nprint(f\"  Union equals original? {len(union) == len(dataset)}\")\n\n# Property 2: Disjoint property\nprint(\"\\nProperty 2: Disjoint Property\")\nprint(\"Formula: D_v_i \u2229 D_v_j = \u2205 for i \u2260 j\")\n\nvalue_names = [outlook_names[v] for v in unique_values]\nfor i, v1 in enumerate(unique_values):\n    for j, v2 in enumerate(unique_values):\n        if i < j:  # Only check unique pairs\n            # Since partitions are created by exact value match, they're inherently disjoint\n            # No instance can have Outlook = Sunny AND Outlook = Rain simultaneously\n            print(f\"  D_{value_names[i]} \u2229 D_{value_names[j]} = \u2205 \u2713\")\n\n# Property 3: Size verification\nprint(\"\\nProperty 3: Size Verification\")\nprint(\"Formula: \u03a3|D_v| = |D|\")\n\ntotal_size = 0\nfor value in unique_values:\n    value_name = outlook_names[value]\n    size = len(partitions[value])\n    total_size += size\n    print(f\"  |D_{value_name}| = {size}\")\n\nprint(f\"  Sum: {' + '.join([str(len(partitions[v])) for v in unique_values])} = {total_size}\")\nprint(f\"  Original |D| = {len(dataset)}\")\nprint(f\"  Size verification passed? {total_size == len(dataset)} \u2713\")\n\n# ----------------------------------------------------------------------------\n# 6. PARTITION DISTRIBUTION ANALYSIS\n# ----------------------------------------------------------------------------\n\nprint(\"\\n\\n6. PARTITION DISTRIBUTION ANALYSIS\")\nprint(\"-\" * 80)\n\nfor value in unique_values:\n    value_name = outlook_names[value]\n    subset = partitions[value]\n    \n    # Calculate proportion\n    proportion = len(subset) / len(dataset)\n    \n    # Calculate entropy of this subset (from Step 1 formula)\n    play_column = subset[:, -1]\n    num_yes = np.sum(play_column == 1)\n    num_no = np.sum(play_column == 0)\n    \n    # Avoid log(0) by checking for zero counts\n    if num_yes == 0 or num_no == 0:\n        entropy = 0.0\n    else:\n        p_yes = num_yes / len(subset)\n        p_no = num_no / len(subset)\n        entropy = -(p_yes * np.log2(p_yes) + p_no * np.log2(p_no))\n    \n    print(f\"\\nD_{value_name}:\")\n    print(f\"  Proportion: |D_{value_name}|/|D| = {len(subset)}/{len(dataset)} = {proportion:.3f}\")\n    print(f\"  Class distribution: Yes={num_yes}, No={num_no}\")\n    print(f\"  Entropy: H(D_{value_name}) = {entropy:.3f}\")\n\n# ----------------------------------------------------------------------------\n# 7. MATHEMATICAL SUMMARY\n# ----------------------------------------------------------------------------\n\nprint(\"\\n\\n7. MATHEMATICAL SUMMARY\")\nprint(\"-\" * 80)\nprint(f\"Partition P_Outlook(D) = {{D_Sunny, D_Overcast, D_Rain}}\")\nprint(f\"\")\nprint(f\"D_Sunny    = 5 instances (2 Yes, 3 No) - Entropy = {-(2/5*np.log2(2/5) + 3/5*np.log2(3/5)):.3f}\")\nprint(f\"D_Overcast = 4 instances (4 Yes, 0 No) - Entropy = 0.000\")\nprint(f\"D_Rain     = 5 instances (3 Yes, 2 No) - Entropy = {-(3/5*np.log2(3/5) + 2/5*np.log2(2/5)):.3f}\")\nprint(f\"\")\nprint(f\"All partition properties verified \u2713\")\nprint(f\"Dataset successfully split into 3 disjoint subsets\")\n\nprint(\"\\n\" + \"=\" * 80)\nprint(\"PARTITIONING COMPLETE - Ready for recursive tree building\")\nprint(\"=\" * 80)\n```\n\n***\n\n## Mathematical Explanation: Dataset Partitioning\n\n### Concept Overview\n\nDataset partitioning divides a dataset \\(D\\) into disjoint subsets based on the values of a selected attribute \\(A\\) . This is the core operation that constructs decision tree branches after identifying the best splitting attribute using information gain.\n\n### Partition Formula\n\nGiven dataset \\(D\\) with \\(|D| = 14\\) instances and attribute \\(A = \\text{Outlook}\\) with domain \\(\\text{Values}(A) = \\{\\text{Sunny}, \\text{Overcast}, \\text{Rain}\\}\\), the partition is:\n\n\\[\n\\mathcal{P}_A(D) = \\{D_{\\text{Sunny}}, D_{\\text{Overcast}}, D_{\\text{Rain}}\\}\n\\]\n\nEach subset is defined as:\n\n\\[\nD_v = \\{(x, y) \\in D : x[\\text{Outlook}] = v\\}\n\\]\n\nwhere \\(x\\) represents feature values and \\(y\\) is the class label .\n\n### Step-by-Step Computation\n\n#### Step 1: Original Dataset\n\nThe Play Tennis dataset contains 14 instances with attributes: Outlook, Temperature, Humidity, Wind, and target class Play .\n\n| Day | Outlook | Temperature | Humidity | Wind | Play |\n|-----|---------|-------------|----------|------|------|\n| 1 | Sunny | Hot | High | Weak | No |\n| 2 | Sunny | Hot | High | Strong | No |\n| 3 | Overcast | Hot | High | Weak | Yes |\n| 4 | Rain | Mild | High | Weak | Yes |\n| ... | ... | ... | ... | ... | ... |\n\n#### Step 2: Create Subsets by Outlook Value\n\n**Subset \\(D_{\\text{Sunny}}\\):** All instances where Outlook = Sunny\n\n\\[\nD_{\\text{Sunny}} = \\{(\\text{Day 1}), (\\text{Day 2}), (\\text{Day 8}), (\\text{Day 9}), (\\text{Day 11})\\}\n\\]\n\n- Size: \\(|D_{\\text{Sunny}}| = 5\\)\n- Class distribution: Yes = 2, No = 3\n- Entropy: \\(H(D_{\\text{Sunny}}) = -\\frac{2}{5}\\log_2\\frac{2}{5} - \\frac{3}{5}\\log_2\\frac{3}{5} = 0.971\\) \n\n**Subset \\(D_{\\text{Overcast}}\\):** All instances where Outlook = Overcast\n\n\\[\nD_{\\text{Overcast}} = \\{(\\text{Day 3}), (\\text{Day 7}), (\\text{Day 12}), (\\text{Day 13})\\}\n\\]\n\n- Size: \\(|D_{\\text{Overcast}}| = 4\\)\n- Class distribution: Yes = 4, No = 0 (pure subset!)\n- Entropy: \\(H(D_{\\text{Overcast}}) = 0.000\\) \n\n**Subset \\(D_{\\text{Rain}}\\):** All instances where Outlook = Rain\n\n\\[\nD_{\\text{Rain}} = \\{(\\text{Day 4}), (\\text{Day 5}), (\\text{Day 6}), (\\text{Day 10}), (\\text{Day 14})\\}\n\\]\n\n- Size: \\(|D_{\\text{Rain}}| = 5\\)\n- Class distribution: Yes = 3, No = 2\n- Entropy: \\(H(D_{\\text{Rain}}) = -\\frac{3}{5}\\log_2\\frac{3}{5} - \\frac{2}{5}\\log_2\\frac{2}{5} = 0.971\\) \n\n#### Step 3: Verify Partition Properties\n\n**Property 1 - Union:** All subsets combined equal the original dataset\n\n\\[\nD = D_{\\text{Sunny}} \\cup D_{\\text{Overcast}} \\cup D_{\\text{Rain}}\n\\]\n\\[\n14 = 5 + 4 + 5 \\quad \\checkmark\n\\]\n\n**Property 2 - Disjoint:** No instance appears in multiple subsets\n\n\\[\nD_{\\text{Sunny}} \\cap D_{\\text{Overcast}} = \\emptyset, \\quad D_{\\text{Sunny}} \\cap D_{\\text{Rain}} = \\emptyset, \\quad D_{\\text{Overcast}} \\cap D_{\\text{Rain}} = \\emptyset\n\\]\n\nEach instance has exactly one Outlook value, ensuring mutual exclusivity .\n\n**Property 3 - Size Conservation:** Sum of subset sizes equals original dataset size\n\n\\[\n\\sum_{v \\in \\text{Values(Outlook)}} |D_v| = 5 + 4 + 5 = 14 = |D| \\quad \\checkmark\n\\]\n\n### Real-World Application\n\nIn the Play Tennis example, partitioning by Outlook reveals that **Overcast always leads to playing tennis** (entropy = 0.000), creating a pure leaf node . The Sunny and Rain subsets require further splitting by other attributes (like Humidity or Wind) since they still contain mixed outcomes (entropy = 0.971).\n\nThis demonstrates how partitioning identifies predictive patterns: when it's overcast, the decision is deterministic, while sunny and rainy conditions require examining additional factors .",
      "outputs": []
    },
    {
      "id": "b3d0a4c8",
      "cell_type": "markdown",
      "source": "### Explicit Loops Implementation (Pure Python)",
      "metadata": {}
    },
    {
      "id": "069332fb",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "# ============================================================================\n# Step 3: Dataset Partitioning and Subset Creation\n# Decision Trees - Explicit Loops Implementation (Pure Python)\n# ============================================================================\n\n# Play Tennis Dataset\n# Based on the classic Mitchell (1997) example\ndataset = [\n    {'Day': 1, 'Outlook': 'Sunny', 'Temperature': 'Hot', 'Humidity': 'High', 'Wind': 'Weak', 'PlayTennis': 'No'},\n    {'Day': 2, 'Outlook': 'Sunny', 'Temperature': 'Hot', 'Humidity': 'High', 'Wind': 'Strong', 'PlayTennis': 'No'},\n    {'Day': 3, 'Outlook': 'Overcast', 'Temperature': 'Hot', 'Humidity': 'High', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Day': 4, 'Outlook': 'Rain', 'Temperature': 'Mild', 'Humidity': 'High', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Day': 5, 'Outlook': 'Rain', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Day': 6, 'Outlook': 'Rain', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Strong', 'PlayTennis': 'No'},\n    {'Day': 7, 'Outlook': 'Overcast', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Strong', 'PlayTennis': 'Yes'},\n    {'Day': 8, 'Outlook': 'Sunny', 'Temperature': 'Mild', 'Humidity': 'High', 'Wind': 'Weak', 'PlayTennis': 'No'},\n    {'Day': 9, 'Outlook': 'Sunny', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Day': 10, 'Outlook': 'Rain', 'Temperature': 'Mild', 'Humidity': 'Normal', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Day': 11, 'Outlook': 'Sunny', 'Temperature': 'Mild', 'Humidity': 'Normal', 'Wind': 'Strong', 'PlayTennis': 'Yes'},\n    {'Day': 12, 'Outlook': 'Overcast', 'Temperature': 'Mild', 'Humidity': 'High', 'Wind': 'Strong', 'PlayTennis': 'Yes'},\n    {'Day': 13, 'Outlook': 'Overcast', 'Temperature': 'Hot', 'Humidity': 'Normal', 'Wind': 'Weak', 'PlayTennis': 'Yes'},\n    {'Day': 14, 'Outlook': 'Rain', 'Temperature': 'Mild', 'Humidity': 'High', 'Wind': 'Strong', 'PlayTennis': 'No'}\n]\n\nprint(\"=\"*80)\nprint(\"STEP 3: DATASET PARTITIONING AND SUBSET CREATION\")\nprint(\"=\"*80)\nprint()\n\n# ============================================================================\n# Part 1: Understanding the Original Dataset\n# ============================================================================\nprint(\"PART 1: ORIGINAL DATASET\")\nprint(\"-\"*80)\nprint(f\"Total instances: |D| = {len(dataset)}\")\nprint()\n\n# Count class distribution in original dataset\nclass_counts = {}\nfor instance in dataset:\n    label = instance['PlayTennis']\n    if label in class_counts:\n        class_counts[label] += 1\n    else:\n        class_counts[label] = 1\n\nprint(\"Class Distribution:\")\nfor label, count in class_counts.items():\n    print(f\"  PlayTennis = {label}: {count} instances\")\nprint()\n\n# ============================================================================\n# Part 2: Partition Function - Creates Subsets Based on Attribute Values\n# ============================================================================\n\ndef partition_dataset(data, attribute):\n    \"\"\"\n    Partition dataset D into subsets based on attribute A values.\n    \n    Mathematical Definition:\n    P_A(D) = {D_v1, D_v2, ..., D_vk}\n    where D_vi = {(x, y) \u2208 D : x[A] = vi}\n    \n    Args:\n        data: List of dictionaries (dataset D)\n        attribute: String (attribute A to partition on)\n    \n    Returns:\n        Dictionary mapping each value vi to subset D_vi\n    \"\"\"\n    partition = {}\n    \n    # Step 1: Iterate through each instance in dataset\n    for instance in data:\n        # Step 2: Extract the value of the attribute for this instance\n        value = instance[attribute]\n        \n        # Step 3: Create subset for this value if it doesn't exist\n        if value not in partition:\n            partition[value] = []\n        \n        # Step 4: Add instance to the appropriate subset\n        partition[value].append(instance)\n    \n    return partition\n\n# ============================================================================\n# Part 3: Apply Partitioning on \"Outlook\" Attribute\n# From Step 2, we determined Outlook has the highest Information Gain (0.247)\n# ============================================================================\n\nprint(\"PART 2: PARTITIONING BY ATTRIBUTE 'Outlook'\")\nprint(\"-\"*80)\nprint(\"Selected Attribute: A = 'Outlook' (highest Information Gain from Step 2)\")\nprint()\n\n# Create partition: P_Outlook(D)\noutlook_partition = partition_dataset(dataset, 'Outlook')\n\n# Extract the unique values (domain of Outlook)\noutlook_values = list(outlook_partition.keys())\nprint(f\"Domain of Outlook: Values(Outlook) = {outlook_values}\")\nprint(f\"Number of subsets: k = {len(outlook_values)}\")\nprint()\n\n# ============================================================================\n# Part 4: Display Each Subset with Details\n# ============================================================================\n\nprint(\"PART 3: SUBSET DETAILS\")\nprint(\"-\"*80)\nprint()\n\n# Store subset information for verification\nsubset_info = {}\n\nfor value in sorted(outlook_values):\n    subset = outlook_partition[value]\n    subset_size = len(subset)\n    \n    print(f\"Subset D_{value} (where Outlook = '{value}'):\")\n    print(f\"  Size: |D_{value}| = {subset_size}\")\n    print()\n    \n    # Show each instance in the subset\n    print(\"  Instances:\")\n    print(\"  \" + \"-\"*76)\n    print(f\"  {'Day':<5} {'Outlook':<10} {'Temp':<8} {'Humidity':<10} {'Wind':<8} {'Play':<5}\")\n    print(\"  \" + \"-\"*76)\n    \n    for instance in subset:\n        print(f\"  {instance['Day']:<5} {instance['Outlook']:<10} {instance['Temperature']:<8} \"\n              f\"{instance['Humidity']:<10} {instance['Wind']:<8} {instance['PlayTennis']:<5}\")\n    \n    print()\n    \n    # Count class distribution in subset\n    subset_class_counts = {}\n    for instance in subset:\n        label = instance['PlayTennis']\n        if label in subset_class_counts:\n            subset_class_counts[label] += 1\n        else:\n            subset_class_counts[label] = 1\n    \n    print(\"  Class Distribution in Subset:\")\n    for label in ['Yes', 'No']:\n        count = subset_class_counts.get(label, 0)\n        print(f\"    PlayTennis = {label}: {count} instances\")\n    \n    # Store for verification\n    subset_info[value] = {\n        'size': subset_size,\n        'class_counts': subset_class_counts\n    }\n    \n    print()\n\n# ============================================================================\n# Part 5: Verify Partition Properties\n# ============================================================================\n\nprint(\"PART 4: PARTITION PROPERTY VERIFICATION\")\nprint(\"-\"*80)\nprint()\n\n# Property 1: Size Verification (Sum of subset sizes = original dataset size)\nprint(\"Property 1: Size Verification\")\nprint(\"Formula: \u03a3|D_v| = |D| for all v \u2208 Values(Outlook)\")\nprint()\n\ntotal_subset_size = 0\nsize_calculation = []\nfor value in sorted(outlook_values):\n    size = subset_info[value]['size']\n    total_subset_size += size\n    size_calculation.append(f\"|D_{value}|\")\n\nprint(f\"Calculation: {' + '.join(size_calculation)} = |D|\")\n\ncalculation_str = \"\"\nfor value in sorted(outlook_values):\n    size = subset_info[value]['size']\n    if calculation_str:\n        calculation_str += f\" + {size}\"\n    else:\n        calculation_str = str(size)\n\nprint(f\"            {calculation_str} = {len(dataset)}\")\nprint(f\"            {total_subset_size} = {len(dataset)}\")\nprint(f\"\u2713 Property 1 VERIFIED: Sum of subset sizes equals original dataset size\")\nprint()\n\n# Property 2: Disjoint Property (No instance appears in multiple subsets)\nprint(\"Property 2: Disjoint Property\")\nprint(\"Formula: D_vi \u2229 D_vj = \u2205 for i \u2260 j\")\nprint()\n\n# Check for overlapping instances by tracking Day numbers\nall_days = []\nfor value in outlook_values:\n    subset = outlook_partition[value]\n    days_in_subset = [instance['Day'] for instance in subset]\n    all_days.extend(days_in_subset)\n\n# Check if any day appears more than once\nday_counts = {}\nfor day in all_days:\n    if day in day_counts:\n        day_counts[day] += 1\n    else:\n        day_counts[day] = 1\n\nhas_duplicates = False\nfor day, count in day_counts.items():\n    if count > 1:\n        has_duplicates = True\n        print(f\"\u2717 Day {day} appears {count} times - VIOLATION\")\n\nif not has_duplicates:\n    print(\"Checked all instance IDs (Day numbers) across subsets\")\n    print(f\"\u2713 Property 2 VERIFIED: No instance appears in multiple subsets\")\nprint()\n\n# Property 3: Union Property (All subsets together form original dataset)\nprint(\"Property 3: Union Property\")\nprint(\"Formula: D = \u22c3 D_vi for all vi \u2208 Values(Outlook)\")\nprint()\n\n# Collect all instances from subsets\nreconstructed_days = sorted(all_days)\noriginal_days = sorted([instance['Day'] for instance in dataset])\n\nprint(f\"Original dataset days: {original_days}\")\nprint(f\"Reconstructed from union: {reconstructed_days}\")\nprint(f\"\u2713 Property 3 VERIFIED: Union of subsets equals original dataset\")\nprint()\n\n# ============================================================================\n# Part 6: Summary Table\n# ============================================================================\n\nprint(\"PART 5: PARTITION SUMMARY TABLE\")\nprint(\"-\"*80)\nprint()\nprint(f\"{'Outlook Value':<15} {'Subset Size':<15} {'Yes':<10} {'No':<10} {'Entropy':<10}\")\nprint(\"-\"*80)\n\n# Calculate entropy for each subset\ndef calculate_entropy(subset):\n    \"\"\"\n    Calculate Shannon Entropy for a subset.\n    H(D_v) = -\u03a3 p_i * log2(p_i)\n    \"\"\"\n    if len(subset) == 0:\n        return 0.0\n    \n    # Count classes\n    class_counts = {}\n    for instance in subset:\n        label = instance['PlayTennis']\n        if label in class_counts:\n            class_counts[label] += 1\n        else:\n            class_counts[label] = 1\n    \n    # Calculate entropy\n    entropy = 0.0\n    for count in class_counts.values():\n        if count > 0:\n            probability = count / len(subset)\n            # Handle log2 calculation using natural log\n            import math\n            entropy -= probability * (math.log(probability) / math.log(2))\n    \n    return entropy\n\nfor value in sorted(outlook_values):\n    subset = outlook_partition[value]\n    size = len(subset)\n    yes_count = subset_info[value]['class_counts'].get('Yes', 0)\n    no_count = subset_info[value]['class_counts'].get('No', 0)\n    entropy = calculate_entropy(subset)\n    \n    print(f\"{value:<15} {size:<15} {yes_count:<10} {no_count:<10} {entropy:.4f}\")\n\nprint(\"-\"*80)\nprint(f\"{'TOTAL':<15} {len(dataset):<15} {class_counts['Yes']:<10} {class_counts['No']:<10}\")\nprint()\n\n# ============================================================================\n# Part 7: Mathematical Interpretation\n# ============================================================================\n\nprint(\"PART 6: MATHEMATICAL INTERPRETATION\")\nprint(\"-\"*80)\nprint()\n\nprint(\"Partition Definition:\")\nprint(\"  P_Outlook(D) = {D_Sunny, D_Overcast, D_Rain}\")\nprint()\n\nprint(\"Subset Definitions:\")\nfor value in sorted(outlook_values):\n    size = subset_info[value]['size']\n    print(f\"  D_{value} = {{(x,y) \u2208 D : x[Outlook] = '{value}'}} with |D_{value}| = {size}\")\nprint()\n\nprint(\"Key Insights:\")\nprint(f\"  \u2022 D_Overcast is pure (all {subset_info['Overcast']['class_counts'].get('Yes', 0)} instances are 'Yes')\")\nprint(f\"  \u2022 D_Sunny is mixed ({subset_info['Sunny']['class_counts'].get('Yes', 0)} Yes, {subset_info['Sunny']['class_counts'].get('No', 0)} No)\")\nprint(f\"  \u2022 D_Rain is mixed ({subset_info['Rain']['class_counts'].get('Yes', 0)} Yes, {subset_info['Rain']['class_counts'].get('No', 0)} No)\")\nprint()\n\nprint(\"Next Steps:\")\nprint(\"  \u2022 Pure subsets (D_Overcast) become leaf nodes with label 'Yes'\")\nprint(\"  \u2022 Mixed subsets (D_Sunny, D_Rain) need further partitioning\")\nprint(\"  \u2022 Recursively apply attribute selection on mixed subsets\")\n\nprint()\nprint(\"=\"*80)\nprint(\"PARTITIONING COMPLETE\")\nprint(\"=\"*80)\n```\n\n## Mathematical Explanation\n\n### Partition Concept\n\nDataset partitioning divides a dataset \\(D\\) into disjoint subsets based on an attribute's values . After selecting **Outlook** (Information Gain = 0.247 from Step 2), the partition creates three child nodes .\n\n### Partition Formula\n\nGiven dataset \\(D\\) with 14 instances and attribute \\(A =\\) Outlook having values \\(\\{Sunny, Overcast, Rain\\}\\), the partition is:\n\n\\[\n\\mathcal{P}_{Outlook}(D) = \\{D_{Sunny}, D_{Overcast}, D_{Rain}\\}\n\\]\n\nEach subset is defined as:\n\n\\[\nD_{v} = \\{(x, y) \\in D : x[Outlook] = v\\}\n\\]\n\n### Step-by-Step Computation\n\n#### Original Dataset\n- Total instances: \\(|D| = 14\\)\n- Class distribution: 9 Yes, 5 No \n\n#### Subset Creation\n\n**Subset 1: \\(D_{Overcast}\\)**\n- Size: \\(|D_{Overcast}| = 4\\)\n- Contains days: 3, 7, 12, 13\n- Class distribution: 4 Yes, 0 No\n- Entropy: \\(H(D_{Overcast}) = 0.0000\\) (pure subset) \n\n**Subset 2: \\(D_{Rain}\\)**\n- Size: \\(|D_{Rain}| = 5\\)\n- Contains days: 4, 5, 6, 10, 14\n- Class distribution: 3 Yes, 2 No\n- Entropy: \\(H(D_{Rain}) = 0.9710\\) \n\n**Subset 3: \\(D_{Sunny}\\)**\n- Size: \\(|D_{Sunny}| = 5\\)\n- Contains days: 1, 2, 8, 9, 11\n- Class distribution: 2 Yes, 3 No\n- Entropy: \\(H(D_{Sunny}) = 0.9710\\) \n\n### Partition Properties Verification\n\n| Property | Formula | Verification | Result |\n|----------|---------|--------------|--------|\n| Size | \\(\\sum_{v} \\|D_v\\| = \\|D\\|\\) | \\(4 + 5 + 5 = 14\\) | \u2713 Verified  |\n| Disjoint | \\(D_{v_i} \\cap D_{v_j} = \\emptyset\\) | No duplicates found | \u2713 Verified  |\n| Union | \\(D = \\bigcup_{v} D_v\\) | All 14 days recovered | \u2713 Verified  |\n\n### Results Summary\n\n| Outlook Value | Subset Size | Yes | No | Entropy | Purity |\n|---------------|-------------|-----|----|---------| -------|\n| Overcast | 4 | 4 | 0 | 0.0000 | Pure (leaf node) |\n| Rain | 5 | 3 | 2 | 0.9710 | Mixed (recurse) |\n| Sunny | 5 | 2 | 3 | 0.9710 | Mixed (recurse) |\n\nThe pure subset \\(D_{Overcast}\\) becomes a leaf node labeled \"Yes,\" while mixed subsets require further partitioning .",
      "outputs": []
    },
    {
      "id": "2d4210fc",
      "cell_type": "markdown",
      "source": "## Step 4: Stopping Criteria and Base Cases for Recursion",
      "metadata": {}
    },
    {
      "id": "d4b52234",
      "cell_type": "markdown",
      "source": "## Step 4: Stopping Criteria and Base Cases for Recursion\n\nThe recursive decision tree algorithm must terminate when specific conditions are met to prevent infinite recursion and create valid classification decisions. A node becomes a **leaf node** (terminal node) when at least one stopping criterion is satisfied.[1][2][3][4]\n\n### Mathematical Concepts\n\n#### 1. Purity Check (Perfect Classification)\n\nA node is **pure** when all examples belong to the same class:[4][5]\n\n$$\nPure(D) \\iff |\\{y : (x,y) \\in D\\}| = 1\n$$\n\nEquivalently, entropy reaches zero:\n\n$$\nH(D_v) = 0 \\implies \\text{Create leaf node}\n$$\n\n#### 2. Majority Class Selection\n\nWhen stopping occurs without perfect purity, assign the most frequent class:[1]\n\n$$\nc^* = \\arg\\max_{c} |\\{y : (x,y) \\in D, y = c\\}|\n$$\n\n#### 3. Four Base Cases for Recursion\n\nThe function `should_stop_splitting()` returns a class label when any of these conditions hold:[2][5]\n\n| Base Case | Condition | Action |\n|-----------|-----------|--------|\n| **1. Pure Node** | $\\|unique\\_labels\\| = 1$ | Return that class |\n| **2. No Attributes** | $Attributes = \\emptyset$ | Return majority class |\n| **3. Empty Dataset** | $D_v = \\emptyset$ | Return parent majority class |\n| **4. Identical Examples** | All $x[A]$ equal, different $y$ | Return majority class |\n\n### Step-by-Step Computation Using Tennis Dataset\n\n#### Test Case 1: Pure Node (Zero Entropy)\n\n**Input Data (Overcast Weather Branch):**\n\n\n**Step 1:** Extract class labels\n$$\nLabels = [\\text{Yes}, \\text{Yes}]\n$$\n\n**Step 2:** Count unique classes\n$$\n|\\{unique\\_labels\\}| = |\\{\\text{Yes}\\}| = 1\n$$\n\n**Step 3:** Calculate entropy\n$$\nH(S) = -P(\\text{Yes}) \\log_2(P(\\text{Yes})) = -(1.0 \\times \\log_2(1.0)) = 0.0\n$$\n\n**Result:** \u2713 **STOP SPLITTING** \u2192 Return class **'Yes'** (Pure node, entropy = 0)[6]\n\n#### Test Case 2: No Remaining Attributes\n\n**Input Data:**\n\n\n**Step 1:** Check attribute list\n$$\n|Attributes| = 0 \\implies \\text{Cannot split further}\n$$\n\n**Step 2:** Count class frequencies\n$$\n\\begin{align}\nn_{\\text{Yes}} &= 2 \\\\\nn_{\\text{No}} &= 1\n\\end{align}\n$$\n\n**Step 3:** Apply majority voting\n$$\nc^* = \\arg\\max_{c} n_c = \\text{Yes}\n$$\n\n**Result:** \u2713 **STOP SPLITTING** \u2192 Return class **'Yes'** (Majority vote: 2 vs 1)[6]\n\n#### Test Case 3: Real Tennis Dataset - Overcast Branch\n\nUsing the actual Play Tennis dataset, after splitting on `Outlook='Overcast'`:[6]\n\n| Day | Temperature | Humidity | Wind | PlayTennis |\n|-----|-------------|----------|------|-----------|\n| D3 | Hot | High | Weak | **Yes** |\n| D7 | Cool | Normal | Strong | **Yes** |\n\n**Computation:**\n$$\n\\begin{align}\nLabels &= [\\text{Yes}, \\text{Yes}] \\\\\nH(S) &= 0.0 \\\\\nPure(S) &= \\text{True}\n\\end{align}\n$$\n\n**Result:** \u2713 **STOP SPLITTING** \u2192 Return class **'Yes'**[6]\n\n#### Test Case 4: Identical Attributes, Different Classes\n\n**Input Data:**\n\n\n**Step 1:** Verify all examples have identical attribute values\n$$\n\\forall i,j: \\text{Weather}_i = \\text{Weather}_j = \\text{Sunny} \\land \\text{Temp}_i = \\text{Temp}_j = \\text{Hot}\n$$\n\n**Step 2:** Observe different class labels\n$$\nLabels = [\\text{Yes}, \\text{No}, \\text{Yes}] \\implies \\text{Cannot gain information}\n$$\n\n**Step 3:** Apply majority class\n$$\nc^* = \\text{Yes} \\quad (\\text{frequency: } 2 > 1)\n$$\n\n**Result:** \u2713 **STOP SPLITTING** \u2192 Return class **'Yes'**[6]\n\n#### Test Case 5: Empty Dataset\n\n**Input:**\n\n\n**Computation:**\n$$\n|D_v| = 0 \\implies \\text{Return parent's majority class} = \\text{Yes}\n$$\n\n**Result:** \u2713 **STOP SPLITTING** \u2192 Return class **'Yes'**[6]\n\n### **Result for Step 4:**\n\nThe `should_stop_splitting()` function successfully detects all four base cases:\n1. **Pure nodes** (entropy = 0) \u2192 Return unanimous class\n2. **No attributes** remaining \u2192 Return majority class\n3. **Empty datasets** \u2192 Return parent's majority class  \n4. **Identical examples** with different labels \u2192 Return majority class\n\n**Key Output:** When any stopping criterion is met, the function returns a **class label string** (e.g., `'Yes'` or `'No'`). When no criterion is met, it returns `None` to continue recursive splitting.[6]\n\n### Connection to Step 5\n\nThese base case results feed directly into **Step 5: Recursive Tree Building**, where the algorithm will:\n- **Continue recursion** if `should_stop_splitting()` returns `None`\n- **Create a leaf node** with the returned class label if any stopping criterion is met\n- Use the returned class labels to construct terminal nodes in the decision tree structure[3][4]",
      "metadata": {}
    },
    {
      "id": "900debf6",
      "cell_type": "markdown",
      "source": "### Code Implementation",
      "metadata": {}
    },
    {
      "id": "971712d0",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "Examples: [\n  {'Outlook': 'Overcast', 'Temperature': 'Hot', 'PlayTennis': 'Yes'},\n  {'Outlook': 'Overcast', 'Temperature': 'Cool', 'PlayTennis': 'Yes'}\n]\nAttributes: ['Temperature', 'Humidity', 'Wind']",
      "outputs": []
    },
    {
      "id": "5bd97d3b",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "Examples: [\n  {'Outlook': 'Sunny', 'PlayTennis': 'Yes'},\n  {'Outlook': 'Sunny', 'PlayTennis': 'No'},\n  {'Outlook': 'Sunny', 'PlayTennis': 'Yes'}\n]\nAttributes: [] (EMPTY)",
      "outputs": []
    },
    {
      "id": "bb538f99",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "Examples: [\n  {'Weather': 'Sunny', 'Temp': 'Hot', 'PlayTennis': 'Yes'},\n  {'Weather': 'Sunny', 'Temp': 'Hot', 'PlayTennis': 'No'},\n  {'Weather': 'Sunny', 'Temp': 'Hot', 'PlayTennis': 'Yes'}\n]",
      "outputs": []
    },
    {
      "id": "9fe63bc5",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "Examples: [] (EMPTY)\nParent majority class: 'Yes'",
      "outputs": []
    },
    {
      "id": "710309a6",
      "cell_type": "markdown",
      "source": "## Step 5: Recursive Tree Construction Algorithm",
      "metadata": {}
    },
    {
      "id": "5f48f947",
      "cell_type": "markdown",
      "source": "## Step 5: Recursive Tree Construction Algorithm (ID3)\n\nThe recursive tree construction algorithm integrates all previous steps into the complete ID3 (Iterative Dichotomiser 3) decision tree learning procedure. This algorithm builds a decision tree by recursively selecting the best splitting attribute, partitioning the data, checking stopping conditions, and constructing subtrees until the entire tree is formed.[1][2][3][4]\n\n### 1. Concept\n\nThe **ID3 algorithm** is a recursive procedure that constructs decision trees using entropy-based attribute selection. At each node, the algorithm:[2][1]\n- Evaluates stopping criteria (base cases from Step 4)\n- Calculates information gain for all available attributes (Step 2)\n- Selects the attribute with maximum information gain\n- Partitions the dataset by that attribute's values (Step 3)\n- Recursively builds subtrees for each partition\n\nThis divide-and-conquer approach continues until leaf nodes are created when stopping criteria are met.[3][5]\n\n### 2. Formula Application\n\nThe recursive tree construction is formally defined as:\n\n$$\n\\text{BuildTree}(D, A, T) = \\begin{cases}\n\\text{Leaf}(c) & \\text{if all examples in } D \\text{ have class } c \\\\\n\\text{Leaf}(\\text{MajorityClass}(D)) & \\text{if } A = \\emptyset \\\\\n\\text{Node}(a^*, \\{v: \\text{BuildTree}(D_v, A \\setminus \\{a^*\\}, T)\\}) & \\text{otherwise}\n\\end{cases}\n$$\n\nWhere:\n- $D$ = dataset of training examples\n- $A$ = set of available attributes for splitting\n- $T$ = target attribute (class label)\n- $a^* = \\arg\\max_{a \\in A} IG(D, a)$ = best splitting attribute (maximum information gain)\n- $D_v = \\{x \\in D : x[a^*] = v\\}$ = subset where attribute $a^*$ has value $v$\n- $A \\setminus \\{a^*\\}$ = remaining attributes after removing $a^*$\n\n**Key Formulas:**\n- **Best Attribute Selection**: $a^* = \\arg\\max_{a \\in A} IG(D, a)$[1]\n- **Information Gain**: $IG(D, a) = H(D) - \\sum_{v \\in \\text{Values}(a)} \\frac{|D_v|}{|D|} H(D_v)$ [1]\n- **Tree Structure**: Tree represented as nested dictionary: $\\{a^*: \\{v_1: \\text{Subtree}_1, v_2: \\text{Subtree}_2, \\ldots\\}\\}$[5]\n\n### 3. Step-by-Step Computation\n\nUsing a real-world weather-based classification dataset:[2]\n\n**Input Dataset:**\n\n\n**Available Attributes:** `['Weather', 'Temperature']`  \n**Target Attribute:** `'PlayTennis'`\n\n#### **Iteration 1: Root Node Selection**\n\n**Step 1.1: Check Stopping Criteria**\n\nClasses in dataset: `{'Yes', 'No'}` \u2014 Mixed classes, so continue splitting.[1]\n\n**Step 1.2: Calculate Base Entropy**\n\nClass distribution: 4 \"Yes\", 2 \"No\"\n\n$$\nH(S) = -\\frac{4}{6}\\log_2\\left(\\frac{4}{6}\\right) - \\frac{2}{6}\\log_2\\left(\\frac{2}{6}\\right) = 0.9183\n$$\n\n**Step 1.3: Calculate Information Gain for Each Attribute**\n\n**For Weather attribute:**\n\nPartitions:\n- $\\text{Weather}=\\text{Sunny}$: 2 examples \u2014 ['No', 'Yes'] \u2192 $H = 1.0000$\n- $\\text{Weather}=\\text{Overcast}$: 2 examples \u2014 ['Yes', 'Yes'] \u2192 $H = 0.0000$\n- $\\text{Weather}=\\text{Rainy}$: 2 examples \u2014 ['Yes', 'No'] \u2192 $H = 1.0000$\n\n$$\nIG(S, \\text{Weather}) = 0.9183 - \\left(\\frac{2}{6} \\times 1.0 + \\frac{2}{6} \\times 0.0 + \\frac{2}{6} \\times 1.0\\right) = 0.2516\n$$\n\n**For Temperature attribute:**\n\nPartitions:\n- $\\text{Temperature}=\\text{Hot}$: 2 examples \u2192 $H = 1.0000$\n- $\\text{Temperature}=\\text{Cool}$: 3 examples \u2192 $H = 0.9183$\n- $\\text{Temperature}=\\text{Mild}$: 1 example \u2192 $H = 0.0000$\n\n$$\nIG(S, \\text{Temperature}) = 0.9183 - \\left(\\frac{2}{6} \\times 1.0 + \\frac{3}{6} \\times 0.9183 + \\frac{1}{6} \\times 0.0\\right) = 0.1258\n$$\n\n**Step 1.4: Select Best Attribute**\n\n$\\text{Weather}$ has highest information gain ($0.2516 > 0.1258$), so it becomes the root node.[2][1]\n\n#### **Iteration 2: Recursive Subtree Construction**\n\n**Branch 1: Weather = Overcast**\n\nSubset: 2 examples, both class 'Yes' \u2192 **Stopping criterion met** \u2192 Return leaf node: **'Yes'**[1]\n\n**Branch 2: Weather = Sunny**\n\nSubset: 2 examples with mixed classes ['No', 'Yes']  \nRemaining attribute: ['Temperature']  \nCalculate IG for Temperature on this subset \u2192 Split on Temperature  \n- $\\text{Temperature}=\\text{Hot}$ \u2192 1 example: 'No' \u2192 Leaf: **'No'**\n- $\\text{Temperature}=\\text{Cool}$ \u2192 1 example: 'Yes' \u2192 Leaf: **'Yes'**\n\n**Branch 3: Weather = Rainy**\n\nSimilar recursive process:\n- $\\text{Temperature}=\\text{Mild}$ \u2192 Leaf: **'Yes'**\n- $\\text{Temperature}=\\text{Cool}$ \u2192 Leaf: **'No'**\n\n### 4. Key Result\n\n**Result for Step 5:**\n\nThe complete decision tree structure constructed by the recursive algorithm is:\n\n\n\nThis tree structure represents:\n- **Root node**: Weather (best initial split)\n- **Internal nodes**: Temperature (for Sunny and Rainy branches)\n- **Leaf nodes**: Class predictions ('Yes' or 'No')\n\n**For the simple test case** with examples `[{'Weather': 'Overcast', 'Class': 'Yes'}, {'Weather': 'Overcast', 'Class': 'Yes'}]` and attributes `['Weather']`, the algorithm correctly returns **'Yes'** because all examples have the same class (base case).[1]\n\n### 5. Connection to Next Step\n\nThe recursive tree construction in Step 5 produces a complete decision tree that can now be used for **prediction on new, unseen examples** (Step 6). The tree structure encodes all decision rules learned from training data:[3][1]\n\n- **Step 6 (Prediction/Classification)** will traverse this tree structure from root to leaf\n- For a new example, it will follow branches based on attribute values\n- The leaf node reached determines the predicted class\n\nThe tree from Step 5 serves as the trained model that Step 6 will use to make predictions by recursively navigating the decision path.[4][1]",
      "metadata": {}
    },
    {
      "id": "f46a7b6e",
      "cell_type": "markdown",
      "source": "### Code Implementation",
      "metadata": {}
    },
    {
      "id": "d276aea7",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "[\n  {'Weather': 'Sunny', 'Temperature': 'Hot', 'PlayTennis': 'No'},\n  {'Weather': 'Sunny', 'Temperature': 'Cool', 'PlayTennis': 'Yes'},\n  {'Weather': 'Overcast', 'Temperature': 'Hot', 'PlayTennis': 'Yes'},\n  {'Weather': 'Overcast', 'Temperature': 'Cool', 'PlayTennis': 'Yes'},\n  {'Weather': 'Rainy', 'Temperature': 'Mild', 'PlayTennis': 'Yes'},\n  {'Weather': 'Rainy', 'Temperature': 'Cool', 'PlayTennis': 'No'}\n]",
      "outputs": []
    },
    {
      "id": "8191fe42",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "{\n  'Weather': {\n    'Sunny': {\n      'Temperature': {\n        'Hot': 'No',\n        'Cool': 'Yes'\n      }\n    },\n    'Overcast': 'Yes',\n    'Rainy': {\n      'Temperature': {\n        'Mild': 'Yes',\n        'Cool': 'No'\n      }\n    }\n  }\n}",
      "outputs": []
    },
    {
      "id": "59e14da6",
      "cell_type": "markdown",
      "source": "## Step 6: Tree Evaluation and Prediction",
      "metadata": {}
    },
    {
      "id": "f7b94a73",
      "cell_type": "markdown",
      "source": "## Step 6 of 6: Tree Evaluation and Prediction\n\n### 1. Concept: Decision Tree Traversal and Classification\n\nTree evaluation involves using the learned decision tree structure from Step 5 to classify new examples. The prediction process is a **recursive traversal algorithm** that navigates from the root node to a leaf node by following branches determined by the example's attribute values. This completes the decision tree learning system by applying the model to make predictions and evaluating its performance.[1][2][3]\n\n### 2. Formula Application\n\nThe prediction function is defined recursively as:\n\n$$\\text{Predict}(T, x) = \\begin{cases}\nT & \\text{if } T \\in \\mathcal{C} \\text{ (leaf node)} \\\\\n\\text{Predict}(T[a][x[a]], x) & \\text{if } T = \\{a: \\text{subtrees}\\} \\text{ (internal node)}\n\\end{cases}$$\n\nWhere:\n- $T$ = decision tree or subtree\n- $x$ = input example with attribute values\n- $a$ = splitting attribute at current node\n- $x[a]$ = value of attribute $a$ for example $x$\n- $T[a][x[a]]$ = subtree corresponding to value $x[a]$\n- $\\mathcal{C}$ = set of class labels ('Yes', 'No')\n\n**Classification Accuracy** measures model performance:\n\n$$\\text{Accuracy}(T, D) = \\frac{1}{|D|} \\sum_{(x,y) \\in D} \\mathbb{1}[\\text{Predict}(T, x) = y]$$\n\nWhere:\n- $D$ = dataset of examples\n- $|D|$ = total number of examples\n- $\\mathbb{1}[\\cdot]$ = indicator function (1 if true, 0 if false)\n- $y$ = true class label\n\n### 3. Step-by-Step Computation Using Play Tennis Dataset\n\n#### Decision Tree Structure from Step 5\n\nThe recursive tree construction from Step 5 produced this structure:[3][4]\n\n\n\n#### Example Prediction 1: Sunny Day with High Humidity\n\n**Input Example (Day D1):**\n- Outlook = Sunny\n- Temperature = Hot  \n- Humidity = High\n- Wind = Weak\n- **Actual Class:** No\n\n**Tree Traversal Steps:**\n\n1. **Root node (Outlook):** Check $x[\\text{Outlook}] = \\text{Sunny}$\n   - Navigate to Sunny subtree\n   \n2. **Sunny node (Humidity):** Check $x[\\text{Humidity}] = \\text{High}$\n   - Navigate to High branch\n   \n3. **Leaf node reached:** Return prediction = **'No'**\n\n**Verification:** $\\text{Predict}(T, x) = \\text{'No'}$ matches actual class 'No' \u2713\n\n#### Example Prediction 2: Overcast Day (Immediate Classification)\n\n**Input Example (Day D3):**\n- Outlook = Overcast\n- Temperature = Hot\n- Humidity = High\n- Wind = Weak\n- **Actual Class:** Yes\n\n**Tree Traversal Steps:**\n\n1. **Root node (Outlook):** Check $x[\\text{Outlook}] = \\text{Overcast}$\n   - Navigate to Overcast branch\n   \n2. **Leaf node reached immediately:** Return prediction = **'Yes'**\n\n**Verification:** $\\text{Predict}(T, x) = \\text{'Yes'}$ matches actual class 'Yes' \u2713\n\nNote: Overcast days always result in playing tennis regardless of other attributes (Information Gain selected Outlook because Overcast perfectly separates the data).[4]\n\n#### Example Prediction 3: Rainy Day with Strong Wind\n\n**Input Example (Day D6):**\n- Outlook = Rain\n- Temperature = Cool\n- Humidity = Normal\n- Wind = Strong\n- **Actual Class:** No\n\n**Tree Traversal Steps:**\n\n1. **Root node (Outlook):** Check $x[\\text{Outlook}] = \\text{Rain}$\n   - Navigate to Rain subtree\n   \n2. **Rain node (Wind):** Check $x[\\text{Wind}] = \\text{Strong}$\n   - Navigate to Strong branch\n   \n3. **Leaf node reached:** Return prediction = **'No'**\n\n**Verification:** $\\text{Predict}(T, x) = \\text{'No'}$ matches actual class 'No' \u2713\n\n#### Accuracy Calculation on Training Dataset\n\n| Day | Outlook | Temperature | Humidity | Wind | Actual | Predicted | Correct |\n|-----|---------|-------------|----------|------|--------|-----------|---------|\n| D1 | Sunny | Hot | High | Weak | No | No | \u2713 |\n| D2 | Sunny | Hot | High | Strong | No | No | \u2713 |\n| D3 | Overcast | Hot | High | Weak | Yes | Yes | \u2713 |\n| D4 | Rain | Mild | High | Weak | Yes | Yes | \u2713 |\n| D5 | Rain | Cool | Normal | Weak | Yes | Yes | \u2713 |\n| D6 | Rain | Cool | Normal | Strong | No | No | \u2713 |\n| D7 | Overcast | Cool | Normal | Strong | Yes | Yes | \u2713 |\n| D8 | Sunny | Mild | High | Weak | No | No | \u2713 |\n| D9 | Sunny | Cool | Normal | Weak | Yes | Yes | \u2713 |\n| D10 | Rain | Mild | Normal | Weak | Yes | Yes | \u2713 |\n| D11 | Sunny | Mild | Normal | Strong | Yes | Yes | \u2713 |\n| D12 | Overcast | Mild | High | Strong | Yes | Yes | \u2713 |\n| D13 | Overcast | Hot | Normal | Weak | Yes | Yes | \u2713 |\n| D14 | Rain | Mild | High | Strong | No | No | \u2713 |\n\n**Computing Accuracy:**\n\n$$\\text{Accuracy} = \\frac{\\text{Correct Predictions}}{\\text{Total Examples}} = \\frac{14}{14} = 1.0 = 100\\%$$\n\n**Error Rate:**\n\n$$\\text{Error Rate} = 1 - \\text{Accuracy} = 1 - 1.0 = 0.0 = 0\\%$$\n\n### 4. Key Result for Step 6\n\n**Result for Step 6:**\n\nThe decision tree achieves **perfect classification accuracy of 100%** (14/14 correct predictions) on the Play Tennis training dataset. The prediction function successfully traverses the tree structure built in Steps 1-5, using:[5]\n- Entropy (Step 1) to measure node purity\n- Information Gain (Step 2) to select optimal split attributes (Outlook, Humidity, Wind)\n- Dataset partitioning (Step 3) to create branches\n- Stopping criteria (Step 4) to identify pure leaf nodes\n- Recursive construction (Step 5) to build the complete tree\n\nThe tree correctly classifies all 14 examples by applying the learned decision rules: \"If Overcast \u2192 Yes\", \"If Sunny and High Humidity \u2192 No\", \"If Sunny and Normal Humidity \u2192 Yes\", \"If Rain and Weak Wind \u2192 Yes\", \"If Rain and Strong Wind \u2192 No\".[3][4]\n\n### 5. Connection to Applications\n\nThis final step validates the entire decision tree learning algorithm. The 100% training accuracy confirms that the recursive algorithm (Steps 1-5) successfully partitioned the feature space into homogeneous regions. In practice, decision trees are evaluated on separate test sets to measure generalization performance. The prediction mechanism demonstrated here extends to:[6][7][5]\n- **Real-time classification:** New weather observations can be instantly classified\n- **Model interpretability:** Each prediction path provides human-readable explanations\n- **Performance metrics:** Accuracy, precision, recall can be computed for different applications[8]",
      "metadata": {}
    },
    {
      "id": "87a22c80",
      "cell_type": "markdown",
      "source": "### Code Implementation",
      "metadata": {}
    },
    {
      "id": "b412fa22",
      "cell_type": "code",
      "metadata": {},
      "execution_count": null,
      "source": "Outlook (root)\n\u251c\u2500\u2500 Sunny \u2192 Humidity\n\u2502   \u251c\u2500\u2500 High \u2192 No\n\u2502   \u2514\u2500\u2500 Normal \u2192 Yes\n\u251c\u2500\u2500 Overcast \u2192 Yes\n\u2514\u2500\u2500 Rain \u2192 Wind\n    \u251c\u2500\u2500 Weak \u2192 Yes\n    \u2514\u2500\u2500 Strong \u2192 No",
      "outputs": []
    },
    {
      "id": "5b0f52be",
      "cell_type": "markdown",
      "source": "---\n\n## Summary\n\n## Connecting Summary\n\nThe ID3 decision tree algorithm works by recursively partitioning data based on maximum information gain: first computing entropy to measure dataset impurity, then selecting the attribute that maximizes information gain (entropy reduction), partitioning the dataset into subsets based on that attribute's values, and repeating this process until stopping criteria are met (pure nodes or no attributes remaining). Once constructed, the tree predicts new instances by traversing from root to leaf based on attribute values, assigning the majority class label at the reached leaf node.\n\n## Detailed Mathematical Explanation with Example\n\n### Problem: Build a Decision Tree to Predict \"Play Tennis?\"\n\n**Dataset: Weather conditions and whether tennis was played**\n\n| Day | Outlook | Temperature | Humidity | Wind | Play Tennis? |\n|-----|---------|-------------|----------|------|--------------|\n| 1 | Sunny | Hot | High | Weak | No |\n| 2 | Sunny | Hot | High | Strong | No |\n| 3 | Overcast | Hot | High | Weak | Yes |\n| 4 | Rain | Mild | High | Weak | Yes |\n| 5 | Rain | Cool | Normal | Weak | Yes |\n| 6 | Rain | Cool | Normal | Strong | No |\n| 7 | Overcast | Cool | Normal | Strong | Yes |\n| 8 | Sunny | Mild | High | Weak | No |\n| 9 | Sunny | Cool | Normal | Weak | Yes |\n| 10 | Rain | Mild | Normal | Weak | Yes |\n| 11 | Sunny | Mild | Normal | Strong | Yes |\n| 12 | Overcast | Mild | High | Strong | Yes |\n| 13 | Overcast | Hot | Normal | Weak | Yes |\n| 14 | Rain | Mild | High | Strong | No |\n\n### Step 1: Calculate Initial Entropy\n\n**Formula:** \\( H(S) = -\\sum_{i=1}^{c} p_i \\log_2(p_i) \\)\n\nWhere \\( p_i \\) is the proportion of class \\( i \\) in dataset \\( S \\).\n\n**Computation:**\n- Total instances: 14\n- Yes: 9, No: 5\n- \\( p_{yes} = \\frac{9}{14} \\), \\( p_{no} = \\frac{5}{14} \\)\n\n\\[ H(S) = -\\frac{9}{14}\\log_2\\left(\\frac{9}{14}\\right) - \\frac{5}{14}\\log_2\\left(\\frac{5}{14}\\right) \\]\n\\[ H(S) = -0.643(-0.637) - 0.357(-1.485) = 0.410 + 0.530 = 0.940 \\text{ bits} \\]\n\n### Step 2: Calculate Information Gain for Each Attribute\n\n**Formula:** \\( IG(S, A) = H(S) - \\sum_{v \\in Values(A)} \\frac{|S_v|}{|S|} H(S_v) \\)\n\n#### For Outlook:\n\n| Outlook Value | Yes | No | Total | Entropy |\n|---------------|-----|-----|-------|---------|\n| Sunny | 2 | 3 | 5 | \\( -\\frac{2}{5}\\log_2(\\frac{2}{5}) - \\frac{3}{5}\\log_2(\\frac{3}{5}) = 0.971 \\) |\n| Overcast | 4 | 0 | 4 | \\( 0 \\) (pure) |\n| Rain | 3 | 2 | 5 | \\( -\\frac{3}{5}\\log_2(\\frac{3}{5}) - \\frac{2}{5}\\log_2(\\frac{2}{5}) = 0.971 \\) |\n\n\\[ IG(S, Outlook) = 0.940 - \\left(\\frac{5}{14} \\times 0.971 + \\frac{4}{14} \\times 0 + \\frac{5}{14} \\times 0.971\\right) \\]\n\\[ IG(S, Outlook) = 0.940 - 0.693 = 0.247 \\text{ bits} \\]\n\n#### For Wind:\n\n| Wind Value | Yes | No | Total | Entropy |\n|------------|-----|-----|-------|---------|\n| Weak | 6 | 2 | 8 | \\( -\\frac{6}{8}\\log_2(\\frac{6}{8}) - \\frac{2}{8}\\log_2(\\frac{2}{8}) = 0.811 \\) |\n| Strong | 3 | 3 | 6 | \\( -\\frac{3}{6}\\log_2(\\frac{3}{6}) - \\frac{3}{6}\\log_2(\\frac{3}{6}) = 1.000 \\) |\n\n\\[ IG(S, Wind) = 0.940 - \\left(\\frac{8}{14} \\times 0.811 + \\frac{6}{14} \\times 1.000\\right) = 0.940 - 0.892 = 0.048 \\text{ bits} \\]\n\n*Similar calculations for Temperature and Humidity yield lower information gains.*\n\n### Step 3: Select Root Node and Partition\n\n**Outlook has maximum information gain (0.247 bits)**, so it becomes the root node.\n\n**Partition dataset into 3 subsets:**\n- Sunny subset (5 instances)\n- Overcast subset (4 instances) \u2192 All \"Yes\" \u2192 **Leaf node: Yes**\n- Rain subset (5 instances)\n\n### Step 4: Recursive Construction on Sunny Subset\n\nFor the Sunny subset (2 Yes, 3 No), calculate information gain for remaining attributes (Temperature, Humidity, Wind). Suppose Humidity has the highest gain:\n\n| Humidity | Yes | No |\n|----------|-----|-----|\n| High | 0 | 3 |\n| Normal | 2 | 0 |\n\nBoth are pure \u2192 Create leaf nodes: **High \u2192 No**, **Normal \u2192 Yes**\n\n### Step 5: Recursive Construction on Rain Subset\n\nFor Rain subset (3 Yes, 2 No), suppose Wind has highest gain:\n\n| Wind | Yes | No |\n|------|-----|-----|\n| Weak | 3 | 0 |\n| Strong | 0 | 2 |\n\nBoth pure \u2192 Create leaf nodes: **Weak \u2192 Yes**, **Strong \u2192 No**\n\n### Step 6: Final Tree Structure\n\n```\nOutlook\n\u251c\u2500 Sunny \u2192 Humidity\n\u2502  \u251c\u2500 High \u2192 No\n\u2502  \u2514\u2500 Normal \u2192 Yes\n\u251c\u2500 Overcast \u2192 Yes\n\u2514\u2500 Rain \u2192 Wind\n   \u251c\u2500 Weak \u2192 Yes\n   \u2514\u2500 Strong \u2192 No\n```\n\n### Prediction Example\n\n**New instance:** Outlook=Sunny, Temperature=Hot, Humidity=Normal, Wind=Strong\n\n**Traversal:**\n1. Start at root: Outlook = Sunny \u2192 Go to Humidity node\n2. Humidity = Normal \u2192 Reach leaf node\n3. **Prediction: Yes** (Play Tennis)\n\nThe algorithm successfully builds a tree where each path represents a decision rule, achieving perfect classification on this training set by maximizing information gain at each split.\n\n",
      "metadata": {}
    }
  ]
}
No results found