This document provides a comprehensive review of the SSA/Compiler optimization section of the midterm exam, including questions, multiple graph representations, detailed solutions, and explanations.
Document Purpose: Study resource for current and future students
Author: Created from midterm exam and solutions
Last Updated: 2025
- Question 1: Dominance Frontier Computation
- Question 2: Iterated Dominance Frontier and φ-Placement
- Question 3: Post-Dominator Tree and Control Dependence
- Question 4: Loop Invariant Code Motion with Exceptions
- Question 5: Dominance Frontier and φ-Placement
- Question 6: Strength Reduction in Loops
- Question 7: Strength Reduction Profitability
- Question 8: Loop Invariant Code Motion vs. Strength Reduction
- Question 9: Horner's Rule for Polynomial Evaluation
- Question 10: SSA-Based Constant Folding and Dead Code Elimination
- Question 11: SSA-Based Value Range Propagation and DCE
- Question 12: SSA Constant Propagation with CFG
- Question 13: SSA Constant Propagation with Nested Branches
- Question 14: SSA Dead Code Elimination with CFG
- Question 15: SSA Constant Folding with Multiple φ-Functions
Given the following Control Flow Graph (CFG), compute the dominance frontier DF(B1).
B0(x=..) -> B1,B2
B1(x=..) -> B3
B2(y=..) -> B3
B3(z=x+y) -> B4(exit)
┌─────┐
│ B0 │ (entry)
│x=.. │
└──┬──┘
│
┌────────┴───────┐
│ │
┌──▼──┐ ┌──▼──┐
│ B1 │ │ B2 │
│x=.. │ │y=.. │
└──┬──┘ └──┬──┘
│ │
│ ┌─────┐ │
└────►│ B3 │◄───┘
│z=x+y│
└──┬──┘
│
┌──▼──┐
│ B4 │ (exit)
└─────┘
flowchart TD
B0["B0<br/>x=.."] --> B1
B0 --> B2
B1["B1<br/>x=.."] --> B3
B2["B2<br/>y=.."] --> B3
B3["B3<br/>z=x+y"] --> B4
B4["B4<br/>(exit)"]
The dominance frontier DF(B1) is:
- A)
{B3, B4} - B)
{B2, B3} - C)
{B4} - D)
{B3} - E)
{B1, B3} - F)
∅(empty set) - G)
{B0, B3} - H)
{B2, B3, B4}
D) {B3}
Definition of Dominance Frontier:
DF(X) = {Y | X dominates a predecessor of Y, but X does not strictly dominate Y}
Step 1: Compute Dominators
dom(B0) = {B0}dom(B1) = {B0, B1}dom(B2) = {B0, B2}dom(B3) = {B0, B3}(both B1 and B2 can reach B3)dom(B4) = {B0, B3, B4}
Step 2: Check Each Node for DF(B1)
For B3:
- Predecessors of
B3are{B1, B2} - Does
B1dominate predecessorB1? Yes (a node dominates itself) - Does
B1strictly dominateB3? No (pathB0→B2→B3exists) - Therefore,
B3 ∈ DF(B1)✓
For B4:
- Predecessors of
B4are{B3} - Does
B1dominate predecessorB3? No (pathB0→B2→B3exists) - Therefore,
B4 ∉ DF(B1)✗
Conclusion: DF(B1) = {B3}
Why Other Options Are Wrong:
- A:
B4is not inDF(B1)becauseB1doesn't dominateB3(the only predecessor ofB4) - B:
B2is not inDF(B1)becauseB1doesn't dominate any predecessor ofB2 - C:
B4alone is incorrect (see A) - E:
B1cannot be in its own dominance frontier in this acyclic CFG - F: Empty set is wrong;
B3clearly belongs - G:
B0dominatesB1, not vice versa - H: Both
B2andB4don't meet the criteria
How many φ-functions for variable x are needed in minimal SSA form?
B0 -> B1,B2
B1(x=1) -> B3
B2(x=2) -> B3
B3(y=x) -> B4,B5
B4(x=3) -> B6
B5(if(..)) -> B3
B6(w=x) -> exit
┌─────┐
│ B0 │
└──┬──┘
│
┌────┴───┐
│ │
┌──▼──┐ ┌──▼──┐
│ B1 │ │ B2 │
│x=1 │ │x=2 │
└──┬──┘ └──┬──┘
│ │
└────┬───┘
│
┌──▼──┐
│ B3 │◄───┐
│y=x │ │
└──┬──┘ │
│ │
┌────┴────┐ │
│ │ │
┌──▼──┐ ┌──▼───┴┐
│ B4 │ │ B5 │
│x=3 │ │if(...)│
└──┬──┘ └───────┘
│
│ ┌─────┐
└───►│ B6 │
│w=x │
└─────┘
flowchart TD
B0["B0"] --> B1
B0 --> B2
B1["B1<br/>x=1"] --> B3
B2["B2<br/>x=2"] --> B3
B3["B3<br/>y=x"] --> B4
B3 --> B5
B4["B4<br/>x=3"] --> B6
B5["B5<br/>if(..)"] --> B3
B6["B6<br/>w=x"] --> Exit["exit"]
- A) 2 φ-functions at
B3andB6 - B) 3 φ-functions at
B3,B6, andB5 - C) 1 φ-function at
B3 - D) 2 φ-functions at
B3(placed twice due to iteration) - E) 3 φ-functions at
B3,B5, andB6 - F) 4 φ-functions at
B3,B5,B6, and one more atB3(iteration effect) - G) 1 φ-function at
B6only - H) 0 φ-functions (reaching definitions analysis suffices)
C) 1 φ-function at B3
Step 1: Identify Definitions
xis defined at:{B1, B2, B4}
Step 2: Compute Dominators
dom(B3) = {B0, B3}(bothB1andB2can reach it)dom(B4) = {B0, B3, B4}dom(B5) = {B0, B3, B5}dom(B6) = {B0, B3, B4, B6}
Step 3: Compute Dominance Frontiers
DF(B1) = {B3}:B1dominates itself (predecessor ofB3) but notB3DF(B2) = {B3}:B2dominates itself (predecessor ofB3) but notB3DF(B4) = ∅:B4strictly dominatesB6(its only successor)
Step 4: Compute Iterated Dominance Frontier
DF⁺({B1, B2, B4}) = DF(B1) ∪ DF(B2) ∪ DF(B4) = {B3} ∪ {B3} ∪ ∅ = {B3}
Step 5: φ-Placement
- Place φ-function at
B3:x₃ = φ(x₁, x₂, x₃) - Note:
B3has three predecessors (B1,B2,B5), so the φ has three arguments - The third argument comes from the loop backedge (
B5passes through the currentx₃)
Why B6 Doesn't Need φ:
B6has only one predecessor (B4)- Single predecessor means no merge point, no φ needed
B4strictly dominatesB6
Why Other Options Are Wrong:
- A, B, E, F:
B6doesn't need φ (single predecessor) - B, E, F:
B5doesn't definex, so no φ needed - D: You never place multiple φ-functions for the same variable at the same block
- G: Missing the essential join at
B3 - H: SSA requires φ-functions at merge points
Which node is B2 control-dependent on?
B0 -> B1
B1(if(p)) -> B2, B3
B2 -> B4
B3 -> exit
B4 -> exit
┌─────┐
│ B0 │
└──┬──┘
│
┌──▼──┐
│ B1 │
│if(p)│
└──┬──┘
│
┌────┴───┐
│ │
┌──▼──┐ ┌──▼──┐
│ B2 │ │ B3 │
│ ... │ │exit │
└──┬──┘ └─────┘
│
┌──▼──┐
│ B4 │
│exit │
└─────┘
flowchart TD
B0["B0"] --> B1
B1["B1<br/>if(p)"] -->|true| B2
B1 -->|false| B3
B2["B2<br/>..."] --> B4
B3["B3<br/>exit"]
B4["B4<br/>exit"]
- A)
B0 - B)
B1 - C)
B4 - D)
B2is not control-dependent on any node - E)
B0andB1(both) - F)
B1andB2(B2depends on itself) - G) All nodes in the CFG
- H)
B3(the alternate exit path)
B) B1
Definition of Control Dependence:
Node
Yis control-dependent on nodeXiff:
- There exists a path
X ⇒ ... ⇒ Yon whichYpost-dominates every node afterX, ANDYdoes NOT post-dominateX
Step 1: Compute Post-Dominators
postdom(B0) = {B0}(assuming unified exit)postdom(B1) = {B1}(both paths lead to exit)postdom(B2) = {B2, B4}(must go through B4 to exit)postdom(B3) = {B3}(direct exit)postdom(B4) = {B4}(direct exit)
Step 2: Check Control Dependence on B1
- Path
B1 → B2 → B4: DoesB2post-dominate all nodes afterB1on this path?B2is on the path, and it post-dominates itself ✓
- Does
B2post-dominateB1?- No! Path
B1 → B3exists that bypassesB2✗
- No! Path
- Therefore,
B2is control-dependent onB1✓
Intuition:
B1's decision (if(p)) determines whetherB2executes- If
pis true,B2runs; if false,B2is skipped - This is the essence of control dependence
Why Other Options Are Wrong:
- A:
B0doesn't make a control decision affectingB2 - C:
B4doesn't controlB2;B2leads toB4 - D:
B2clearly depends onB1's branch - E:
B0doesn't introduce control dependence - F: Nodes don't depend on themselves by standard definition
- G: Control dependence is specific, not universal
- H:
B3is an alternate path but doesn't controlB2
Expression t = a/b is loop-invariant (a, b not modified in loop). Under which condition is hoisting t = a/b before B1 ILLEGAL?
B0 -> B1
B1(if(i<n)) -> B2, exit
B2(t=a/b, c[i]=t, i++) -> B1
┌─────┐
│ B0 │
└──┬──┘
│
┌──────▼──────┐
│ B1 │◄────┐
│ if(i<n) │ │
└──┬─────┬────┘ │
│ │ │
exit ┌──▼───┐ │
│ B2 │ │
│ t=a/b│ │ (may throw div-by-zero)
│c[i]=t│ │
│ i++ │ │
└──┬───┘ │
└──────────┘
flowchart TD
B0["B0"] --> B1
B1["B1<br/>if(i<n)"] -->|true| B2
B1 -->|false| Exit["exit"]
B2["B2<br/>t=a/b<br/>c[i]=t<br/>i++"] --> B1
- A) When
bmight be zero - B) When the loop might execute zero iterations (
n ≤ 0) - C) When division has side effects (sets CPU flags)
- D) When
aandbare floating-point with NaN handling - E) B only (must not introduce new exceptions)
- F) A and B together
- G) B, C, and D together
- H) Never illegal - hoisting loop-invariant code is always safe
E) B only (must not introduce new exceptions)
Core Principle:
An optimization is ILLEGAL if it introduces a NEW exception on a path that would not have thrown an exception in the original code.
Analysis:
Original Code Behavior:
- If
i < nis false initially (zero iterations), the divisiont = a/bNEVER executes - If
i < nis true, the loop runs and division executes
After Hoisting:
- The division
t = a/bALWAYS executes before the loop test - Even if the loop runs zero times, division still happens
Case Analysis:
Case 1: Loop runs, b = 0
- Original: Division executes → exception thrown ✗
- Hoisted: Division executes → exception thrown ✗
- Result: Both throw → No NEW exception → Legal ✓
Case 2: Loop doesn't run (n ≤ 0), b = 0
- Original: Loop skipped, division never executes → no exception ✓
- Hoisted: Division executes before test → exception thrown ✗
- Result: NEW exception introduced → ILLEGAL ✗
Case 3: Loop doesn't run (n ≤ 0), b ≠ 0
- Original: Loop skipped, division never executes → no exception ✓
- Hoisted: Division executes → computes value (unused) → no exception ✓
- Result: No exception in either → Legal ✓ (but semantically questionable)
Why B Only Is Sufficient:
The zero-iteration scenario is the ONLY case where hoisting introduces a new exception. When the loop might not run, you cannot safely move a potentially-throwing operation outside the loop, regardless of whether b is zero or not.
Why Other Options Are Wrong:
- A alone: If loop always runs and
b=0, BOTH versions throw → no new exception - F:
Ais redundant;Balone captures the illegality - C: CPU flags are typically not considered observable side effects
- D: NaN propagates in both versions; no new exception
- G: C and D don't contribute to illegality
- H: Exception semantics clearly matter
Key Takeaway: Zero-trip loops are the critical case for exception-safety in LICM. Always ask: "Could this code path avoid the exception in the original but not in the optimized version?"
Where must φ-functions for variable x be placed?
B0(x=1) -> B1
B1(if(p)) -> B2, B3
B2(x=2) -> B4
B3(y=..) -> B4
B4(z=x)
┌─────┐
│ B0 │ (entry)
│x=1 │
└──┬──┘
│
┌──▼──┐
│ B1 │
│if(p)│
└──┬──┘
│
┌────┴───┐
│ │
┌──▼──┐ ┌──▼──┐
│ B2 │ │ B3 │
│x=2 │ │y=.. │
└──┬──┘ └──┬──┘
│ │
└────┬───┘
│
┌──▼──┐
│ B4 │
│z=x │
└─────┘
flowchart TD
B0["B0<br/>x=1"] --> B1
B1["B1<br/>if(p)"] -->|true| B2
B1 -->|false| B3
B2["B2<br/>x=2"] --> B4
B3["B3<br/>y=.."] --> B4
B4["B4<br/>z=x"]
- A)
B1only (decision point) - B)
B1andB4(both join points) - C)
B2andB3(definition sites) - D) No φ needed (
B0dominates all) - E)
B0andB4(entry and exit) - F) All blocks need φ-functions
- G)
B4only (join point after definitions) - H)
B3andB4(B3needs φ even without definingx)
G) B4 only (join point after definitions)
Step 1: Identify Definitions
xis defined at:{B0, B2}
Step 2: Compute Dominance
dom(B0) = {B0}dom(B1) = {B0, B1}dom(B2) = {B0, B1, B2}dom(B3) = {B0, B1, B3}dom(B4) = {B0, B1, B4}(bothB2andB3can reach it)
Step 3: Compute Dominance Frontiers
For B0:
B0dominates all nodesDF(B0) = ∅(no node whereB0dominates a predecessor but not the node itself)
For B2:
- Predecessors of
B4are{B2, B3} - Does
B2dominate predecessorB2ofB4? Yes ✓ - Does
B2strictly dominateB4? No (pathB0→B1→B3→B4bypassesB2) ✗ - Therefore
DF(B2) = {B4}
Step 4: Iterated Dominance Frontier
DF⁺({B0, B2}) = DF(B0) ∪ DF(B2) = ∅ ∪ {B4} = {B4}
Step 5: φ-Placement
- Place φ at
B4:x₄ = φ(x₂, x₀) - The φ merges:
- From
B2:x₂(redefined value) - From
B3:x₀(original value fromB0, passed through)
- From
SSA Form:
B0: x₀ = 1
B1: if(p) ...
B2: x₂ = 2
B3: y = ...
B4: x₄ = φ(B2: x₂, B3: x₀)
z = x₄
Why Other Options Are Wrong:
- A, B:
B1is not a join point forxdefinitions (no φ needed) - C: We place φ at JOIN points, not definition sites
- D: Even though
B0definesx,B2redefines it, creating multiple reaching definitions - E:
B0is entry, doesn't need φ - F: Only merge points with multiple reaching definitions need φ
- H:
B3doesn't need φ (passes throughx₀, doesn't merge definitions)
What is the CORRECT strength-reduced form with auxiliary induction variables?
// Original loop
for (int i = 0; i < n; i++) {
a[i * 4] = i * 4 + 10;
b[i * 4] = i * 8;
}int t1 = 0, t2 = 10;
for (int i = 0; i < n; i++) {
a[???] = ??? + 10
b[???] = ???
}- A)
t1=0; t2=0; ... a[t1]=t1+10; b[t2]=t2; ... t1+=4; t2+=8; - B)
t1=0; t2=0; ... a[t1]=t2+10; b[t1]=t2; ... t1+=4; t2+=8; - C)
t=0; ... a[t]=t+10; b[t]=t*2; ... t+=4; - D)
t1=0; t2=0; ... a[t1]=t1+10; b[t1]=t2; ... t1+=4; t2+=4; - E)
t1=0; t2=10; ... a[t1]=t2; b[t1]=t1*2; ... t1+=4; t2+=4; - F) Cannot apply strength reduction (multiplications are necessary)
- G)
t=0; ... a[t]=t+10; b[t]=t+t; ... t+=4; - H)
t1=0; t2=0; t3=10; ... a[t1]=t3; b[t1]=t2; ... t1+=4; t2+=8; t3+=4;
H) t1=0; t2=0; t3=10; ... a[t1]=t3; b[t1]=t2; ... t1+=4; t2+=8; t3+=4;
Step 1: Identify Induction Variables
Basic IV: i (loop counter, i++ means increment by 1)
Dependent IVs (expressions that are linear functions of i):
i * 4(for array indexing in both statements)i * 8(RHS ofb[i*4] = ...)i * 4 + 10(RHS ofa[i*4] = ...)
Step 2: Create Auxiliary Variables
For each dependent IV, create an auxiliary variable that:
- Initializes to the value when
i=0 - Updates by adding the stride each iteration
For i * 4 (array indexing):
- Auxiliary:
t1 - Initial:
t1 = 0 * 4 = 0 - Update:
t1 += 4(sinceiincrements by 1,i*4increments by 4)
For i * 8:
- Auxiliary:
t2 - Initial:
t2 = 0 * 8 = 0 - Update:
t2 += 8
For i * 4 + 10:
- Auxiliary:
t3 - Initial:
t3 = 0 * 4 + 10 = 10 - Update:
t3 += 4(the constant+10doesn't change)
Step 3: Transformed Code
t1 = 0; // tracks i * 4 (for array indexing)
t2 = 0; // tracks i * 8
t3 = 10; // tracks i * 4 + 10
for (i = 0; i < n; i++) {
a[t1] = t3;
b[t1] = t2;
t1 += 4;
t2 += 8;
t3 += 4;
}Step 4: Verification
Iteration i=0:
t1=0, t2=0, t3=10a[0] = 10✓ (original:a[0*4] = 0*4+10 = 10)b[0] = 0✓ (original:b[0*4] = 0*8 = 0)
Iteration i=1:
t1=4, t2=8, t3=14a[4] = 14✓ (original:a[1*4] = 1*4+10 = 14)b[4] = 8✓ (original:b[1*4] = 1*8 = 8)
Iteration i=2:
t1=8, t2=16, t3=18a[8] = 18✓ (original:a[2*4] = 2*4+10 = 18)b[8] = 16✓ (original:b[2*4] = 2*8 = 16)
Benefits:
- Eliminates 6 multiplications per iteration (3 in original code)
- Replaces with 3 additions per iteration
- Multiplications are typically 3-10× slower than additions
Costs:
- 3 additional live variables (
t1,t2,t3) - Increased register pressure
- May not be profitable if it causes register spilling
Why Other Options Are Wrong:
- A: While
a[t1]=t1+10is valid, usingt3to pre-computei*4+10is more optimized - B:
a[t1]=t2+10uses wrong variable (t2tracksi*8, noti*4) - C:
t*2still has multiplication; needs separate tracking - D:
t2+=4is WRONG;t2should tracki*8, so increment by 8 - E:
t1*2defeats purpose (multiplication still in loop) - F: Strength reduction is definitely applicable
- G: Doesn't properly track all three distinct expressions
Under which condition is strength reduction UNPROFITABLE?
// Loop with address calculation
for (i = 0; i < 100; i++) {
sum += array[i * stride];
}- Integer multiplication: 3 cycles latency, 1/cycle throughput
- Integer addition: 1 cycle latency, 2/cycle throughput
- 16 general-purpose registers available
- Current register pressure: 14 registers in use
p = &array[0];
for (i = 0; i < 100; i++) {
sum += *p;
p += stride;
}- A)
stride = 1(unit stride) - B)
stride = 1024(large stride) - C) The loop body is very large (50+ instructions)
- D) When the extra live variable
pcauses register spilling - E) When
strideis not a compile-time constant - F) Never unprofitable - strength reduction always improves performance
- G) When the loop has only 2 iterations (
n=2) - H) When multiplication has 1-cycle latency
D) When the extra live variable p causes register spilling
Cost-Benefit Analysis:
Benefits:
- Eliminates multiplication per iteration: ~2-3 cycles saved
- 100 iterations × 2.5 cycles = ~250 cycles saved
Costs:
- Introduces one new live variable (
p) - Current: 14 registers used
- After: 15 registers used
- If total live variables exceed 16 → spilling occurs
Spilling Cost:
- Store to memory: ~5 cycles
- Load from memory: ~5 cycles
- Per iteration overhead: can be 10+ cycles
- 100 iterations × 10 cycles = 1000+ cycles COST
When Spilling Occurs:
If adding p pushes the register allocator over the limit:
- Net result: -750 cycles (1000 cost - 250 benefit)
- Optimization becomes UNPROFITABLE
Why Other Options Are Wrong:
A) stride = 1 (unit stride):
- Still beneficial: pointer increment is cheaper than multiplication
- Many architectures have auto-increment addressing modes
- NOT unprofitable
B) stride = 1024 (large stride):
- Large constant doesn't affect the arithmetic cost
- Addition with large constant still cheaper than multiplication
- NOT unprofitable
C) Loop body is very large:
- Large loop body doesn't change the per-iteration arithmetic cost
- Might even make strength reduction MORE profitable (amortized over more work)
- NOT unprofitable
E) stride not compile-time constant:
- Runtime value can still be used:
p += strideworks fine - Variable stride doesn't prevent strength reduction
- NOT unprofitable
F) Never unprofitable:
- FALSE - register spilling can make it unprofitable
G) Only 2 iterations:
- Small iteration count reduces total benefit (2 × 2.5 = 5 cycles)
- But doesn't make it unprofitable unless overhead exceeds 5 cycles
- Spilling is worse
H) Multiplication has 1-cycle latency:
- Would reduce benefit to near zero
- But still wouldn't be unprofitable (no harm)
- Spilling actively COSTS performance
Key Insight: Strength reduction trades computation for storage. It's only profitable when:
- The storage cost (register pressure) doesn't exceed available resources
- The computational savings outweigh any memory access overhead from spilling
Which optimization applies to which expression?
for (i = 0; i < n; i++) {
a[i] = b[i] * factor + offset;
}Note: factor and offset are loop-invariant (not modified in loop)
- A) Strength reduction:
i; LICM:factor,offset(but notb[i] * factor) - B) Strength reduction:
i; LICM:factor,offset,b[i] * factor - C) Strength reduction:
i,b[i]; LICM:offset - D) Strength reduction: none; LICM:
factor,offset - E) Strength reduction:
i,b[i] * factor,offset; LICM: none - F) Strength reduction:
i,b[i],b[i] * factor; LICM:factor + offset - G) Strength reduction:
i,b[i] * factor; LICM:offset - H) Both optimizations apply to all expressions
A) Strength reduction: i; LICM: factor, offset (but not b[i] * factor)
Understanding the Optimizations:
Strength Reduction:
- Replaces expensive operations with cheaper ones
- Based on induction variable relationships
- Applies to expressions that are linear functions of the loop counter
Loop Invariant Code Motion (LICM):
- Hoists computations that don't change across iterations
- Applies to expressions that have the same value every iteration
Analysis of Each Expression:
1. i (loop counter):
- Strength Reduction: YES
- Array indexing can use pointer arithmetic
- Instead of computing
&a[i]and&b[i]each iteration, increment pointers - Transform:
pa++andpb++instead of&a[i]and&b[i]
2. factor (scalar variable):
- LICM: YES (already invariant, just a load)
- Strength Reduction: NO (not dependent on induction variable)
- Already optimal as a single variable reference
3. offset (scalar variable):
- LICM: YES (already invariant, just a load)
- Strength Reduction: NO (not dependent on induction variable)
- Already optimal as a single variable reference
4. b[i] (array access):
- LICM: NO (depends on
i, changes each iteration) - Strength Reduction: YES (indirectly, via pointer to
b) - The indexing operation benefits from strength reduction on
i
5. b[i] * factor (multiplication):
- LICM: NO (
b[i]changes each iteration, so the product changes) - Strength Reduction: NO (not a linear function of
ialone) - Cannot be optimized to addition without knowing relationship between
b[i]andi
Transformed Code:
// After both optimizations
pa = &a[0];
pb = &b[0];
// factor and offset already available (LICM implicit)
for (i = 0; i < n; i++) {
*pa = *pb * factor + offset;
pa++; // strength reduction for a[i]
pb++; // strength reduction for b[i]
}Why Other Options Are Wrong:
- B:
b[i] * factoris NOT loop-invariant (depends onithroughb[i]) - C:
b[i]alone doesn't get strength reduction; the indexing does - D: Strength reduction DOES apply to array indexing via
i - E, G:
b[i] * factorcannot be strength reduced (not linear ini) - F: Cannot hoist
factor + offsetas a combined value (need individual values) - H: Not all expressions benefit from both optimizations
Key Distinction:
- Strength Reduction: Optimizes operations involving induction variables (loop counters)
- LICM: Optimizes expressions that are constant across loop iterations
How many multiplications does each form require?
Evaluate: 2x³ + 3x² + 4x + 5
result = 2*x*x*x + 3*x*x + 4*x + 5;result = ((2*x + 3)*x + 4)*x + 5;- A) Form A: 6 muls; Form B: 3 muls
- B) Form A: 4 muls; Form B: 3 muls
- C) Form A: 3 muls; Form B: 3 muls (equal)
- D) Form A: 6 muls; Form B: 4 muls
- E) Form A: 5 muls; Form B: 3 muls
- F) Form A: 7 muls; Form B: 3 muls
- G) Form A: 3 muls; Form B: 2 muls
- H) Depends on compiler optimization level
A) Form A: 6 muls; Form B: 3 muls
Assumption: Counting multiplications for naive evaluation without common subexpression elimination (CSE). Each power is computed independently.
Form A: Naive Evaluation
Breaking down 2*x*x*x + 3*x*x + 4*x + 5:
Term 1: 2*x*x*x
2 * x→ 1 multiplication(2*x) * x→ 1 multiplication((2*x)*x) * x→ 1 multiplication- Subtotal: 3 multiplications
Term 2: 3*x*x
3 * x→ 1 multiplication(3*x) * x→ 1 multiplication- Subtotal: 2 multiplications
Term 3: 4*x
4 * x→ 1 multiplication- Subtotal: 1 multiplication
Term 4: 5
- No multiplication
Total: 3 + 2 + 1 = 6 multiplications
Form B: Horner's Rule
Breaking down ((2*x + 3)*x + 4)*x + 5:
Step 1: 2*x
- 1 multiplication
Step 2: (2*x + 3)*x
- Add 3 to result (no multiplication)
- Multiply by x → 1 multiplication
Step 3: ((2*x+3)*x + 4)*x
- Add 4 to result (no multiplication)
- Multiply by x → 1 multiplication
Step 4: (((2*x+3)*x+4)*x + 5)
- Add 5 to result (no multiplication)
Total: 1 + 1 + 1 = 3 multiplications
Horner's Rule Algorithm:
For polynomial aₙxⁿ + aₙ₋₁xⁿ⁻¹ + ... + a₁x + a₀:
result = aₙ
for i from n-1 down to 0:
result = result * x + aᵢ
Each iteration: 1 multiplication + 1 addition Total multiplications: n (where n is degree)
Benefits:
- Fewer multiplications:
O(n)instead ofO(n²) - Better numerical stability (for floating-point)
- Sequential dependencies (easier for pipelining)
Comparison Table:
| Degree | Naive | Horner's | Savings |
|---|---|---|---|
| 1 | 1 | 1 | 0 |
| 2 | 3 | 2 | 1 |
| 3 | 6 | 3 | 3 |
| 4 | 10 | 4 | 6 |
| 5 | 15 | 5 | 10 |
| 10 | 55 | 10 | 45 |
Why Other Options Are Wrong:
- B-G: Incorrect counting of multiplications
- H: The multiplication count is algorithmic, not dependent on optimization level (unless CSE is applied, which changes the algorithm)
Note on CSE: With common subexpression elimination, naive form could be improved:
temp1 = x * x; // 1 mul
temp2 = temp1 * x; // 1 mul (x³)
result = 2*temp2 + 3*temp1 + 4*x + 5; // 3 muls
// Total: 4 mulsBut Horner's rule is still better!
After constant folding and dead code elimination, what code remains?
B0: x1=5, y1=10, if(x1<3) -> B1,B2
B1: z1=x1+y1, w1=z1*2 -> B3
B2: z2=x1-y1, w2=z2*3 -> B3
B3: z3=φ(z1,z2), w3=φ(w1,w2), result=z3+w3
.B0:
x₁ = 5
y₁ = 10
if (x₁ < 3) goto B1 else goto B2
.B1:
z₁ = x₁ + y₁
w₁ = z₁ * 2
goto B3
.B2:
z₂ = x₁ - y₁
w₂ = z₂ * 3
goto B3
.B3:
z₃ = φ(z₁, z₂)
w₃ = φ(w₁, w₂)
result = z₃ + w₃
flowchart TD
B0["B0<br/>x₁=5<br/>y₁=10<br/>if(x₁<3)"] -->|true| B1
B0 -->|false| B2
B1["B1<br/>z₁=x₁+y₁<br/>w₁=z₁*2"] --> B3
B2["B2<br/>z₂=x₁-y₁<br/>w₂=z₂*3"] --> B3
B3["B3<br/>z₃=φ(z₁,z₂)<br/>w₃=φ(w₁,w₂)<br/>result=z₃+w₃"]
- A)
B0: if(5<3) ...; B1: ...; B2: z₂=-5; w₂=-15; B3: result=-20 - B)
B0: x₁=5; y₁=10; B2: z₂=-5; w₂=-15; B3: result=-20 - C)
B0: x₁=5; y₁=10; B3: result=φ(-20, -20) - D) All blocks remain (cannot eliminate φ-functions)
- E)
B0: goto B2; B2: result=-20 - F)
B0: result=-20 - G) Only
B3remains with result computation - H)
B2: z₂=-5; w₂=-15; B3: result=-20
F) B0: result=-20
Step 1: Constant Folding in B0
x₁ = 5(constant)y₁ = 10(constant)if (5 < 3)→ evaluate toFALSE(constant condition)- Control flow: Always take
B2, never takeB1
Step 2: Mark Unreachable Code
B1is unreachable → mark for deletion- All definitions in
B1are dead
Step 3: Constant Propagation to B2
z₂ = x₁ - y₁ = 5 - 10 = -5w₂ = z₂ * 3 = -5 * 3 = -15
Step 4: Simplify B3 φ-Functions
z₃ = φ(z₁, z₂)butB1is unreachablez₃ = φ(⊥, -5) = -5(φ with one reachable predecessor)w₃ = φ(w₁, w₂)butB1is unreachablew₃ = φ(⊥, -15) = -15result = z₃ + w₃ = -5 + (-15) = -20
Step 5: Dead Code Elimination
B1: unreachable → deleteB2: all values only used inB3→ inline and eliminateB3: φ-functions collapse, computation is constant → inline- Final: Just the constant result
Optimization Sequence:
Original:
B0: x₁=5, y₁=10, if(x₁<3) -> B1,B2
B1: [unreachable]
B2: z₂=-5, w₂=-15
B3: z₃=-5, w₃=-15, result=-20
After constant propagation:
B0: if(false) -> B1,B2
B2: z₂=-5, w₂=-15
B3: result=-20
After branch elimination:
B0: goto B2
B2: z₂=-5, w₂=-15
B3: result=-20
After dead code elimination:
B0: result=-20
Final Code:
.B0:
result = -20
Why Other Options Are Wrong:
- A: Retains unnecessary conditional and blocks
- B: Retains dead variable assignments (
x₁,y₁,z₂,w₂) - C: φ-functions should be eliminated when collapsed to constant
- D: φ-functions CAN be eliminated when they collapse
- E: Still has unnecessary goto and separate block
- G:
B3can be merged intoB0 - H: Retains intermediate blocks unnecessarily
Key Techniques:
- Constant Folding: Evaluate constant expressions at compile time
- Constant Propagation: Replace variables with their constant values
- Branch Elimination: Remove branches with constant conditions
- φ-Function Simplification: Collapse φ-functions with one reachable input
- Dead Code Elimination: Remove unreachable and unused code
After value range analysis proves a₂ is always ≤ 14, which code remains?
B0: a1=10, b1=20 -> B1
B1: a2=φ(a1,a3), if(a2>100) -> B2,B3
B2: c1=a2/2 -> B4
B3: a3=a2+1, if(a3<15) -> B1,B4
B4: a4=φ(a2,a3), c2=φ(c1,0), result=a4+c2
B0:
a₁ = 10
b₁ = 20
goto B1
B1:
a₂ = φ(a₁, a₃)
if (a₂ > 100) goto B2 else goto B3
B2:
c₁ = a₂ / 2
goto B4
B3:
a₃ = a₂ + 1
if (a₃ < 15) goto B1 else goto B4
B4:
a₄ = φ(a₂, a₃)
c₂ = φ(c₁, 0)
result = a₄ + c₂
flowchart TD
B0["B0<br/>a₁=10<br/>b₁=20"] --> B1
B1["B1<br/>a₂=φ(a₁,a₃)<br/>if(a₂>100)"] -->|true| B2
B1 -->|false| B3
B2["B2<br/>c₁=a₂/2"] --> B4
B3["B3<br/>a₃=a₂+1<br/>if(a₃<15)"] -->|true| B1
B3 -->|false| B4
B4["B4<br/>a₄=φ(a₂,a₃)<br/>c₂=φ(c₁,0)<br/>result=a₄+c₂"]
- A) All blocks remain unchanged
- B)
B0-B1-B3-B4remain;B2deleted;c₂=0inB4 - C)
B2is eliminated;B4: a₄=φ(⊥,a₃); c₂=0; result=a₄ - D) Entire loop eliminated;
B0: result=14 - E)
B2eliminated;B3: produces values11,12,13,14 - F)
B0: a₁=10; B1-B3: loop executes; B4: result=a₄ - G) Only
B4: result=14remains - H) Cannot eliminate
B2(may be reachable)
B) B0-B1-B3-B4 remain; B2 deleted; c₂=0 in B4
Step 1: Value Range Analysis
Track the range of a₂ through the loop:
- Initial:
a₁ = 10→a₂ = 10(first iteration) - Iteration 1:
a₃ = 10 + 1 = 11→a₂ = 11(second iteration) - Iteration 2:
a₃ = 11 + 1 = 12→a₂ = 12(third iteration) - Iteration 3:
a₃ = 12 + 1 = 13→a₂ = 13(fourth iteration) - Iteration 4:
a₃ = 13 + 1 = 14→a₂ = 14(fifth iteration) - Iteration 5:
a₃ = 14 + 1 = 15→if(15 < 15)is FALSE → exit to B4
Range of a₂: [10, 14]
Step 2: Branch Analysis
B1: if (a₂ > 100)
- Since
a₂ ≤ 14, the conditiona₂ > 100is ALWAYS FALSE B2is never taken → unreachable
Step 3: Dead Code Elimination
B2is unreachable → deletec₁is never defined → dead
Step 4: Simplify B4 φ-Functions
Original:
a₄ = φ(B2: a₂, B3: a₃)
c₂ = φ(B2: c₁, B3: 0)
After eliminating B2:
a₄ = φ(B3: a₃) → a₄ = a₃
c₂ = φ(B3: 0) → c₂ = 0
Step 5: Remaining Code
B0:
a₁ = 10
b₁ = 20
goto B1
B1:
a₂ = φ(a₁, a₃)
// if (a₂ > 100) always false, goes to B3
goto B3
B3:
a₃ = a₂ + 1
if (a₃ < 15) goto B1 else goto B4
B4:
a₄ = a₃ // simplified φ
c₂ = 0 // simplified φ
result = a₄ + c₂ = a₄ + 0 = a₄
Why We Can't Eliminate More:
The loop must still execute to compute the final value:
- Loop runs from
a₂=10toa₂=14 - Final value
a₄ = 14(last value ofa₃) - Cannot be computed without running the loop
Why Other Options Are Wrong:
- A:
B2is provably unreachable - C: Notation is awkward; B is clearer
- D: Loop must execute to reach final value 14
- E: Too vague, doesn't specify complete structure
- F: Too vague
- G: Loop computation cannot be eliminated
- H: VRP proves
B2is unreachable
Key Insight: Value range propagation enables dead code elimination by proving branches are always/never taken. However, loops that compute final values (not just check conditions) cannot be eliminated entirely.
After constant propagation and dead code elimination, which blocks remain?
B0(x=7,y=3) -> B1
B1(if(x>5)) -> B2,B3
B2(z=x+y) -> B4
B3(z=x*2) -> B4
B4(w=φ(z,z), r=w-y)
┌─────┐
│ B0 │
│x=7 │
│y=3 │
└──┬──┘
│
┌──▼────┐
│ B1 │
│if(x>5)│
└──┬────┘
│
┌────┴────┐
│ │
┌──▼──┐ ┌──▼──┐
│ B2 │ │ B3 │
│z=x+y│ │z=x*2│
└──┬──┘ └──┬──┘
│ │
└────┬───┘
│
┌──▼─────┐
│ B4 │
│w=φ(z,z)│
│r=w-y │
└────────┘
flowchart TD
B0["B0<br/>x=7<br/>y=3"] --> B1
B1["B1<br/>if(x>5)"] -->|true| B2
B1 -->|false| B3
B2["B2<br/>z=x+y"] --> B4
B3["B3<br/>z=x*2"] --> B4
B4["B4<br/>w=φ(z,z)<br/>r=w-y"]
- A) All blocks
B0-B4remain - B)
B0, B3, B4remain (B2eliminated) - C) Only
B0remains withr=7 - D)
B0, B1, B2, B4remain - E)
B0, B2, B4remain (B3eliminated) - F)
B0, B4remain (B1-B3eliminated) - G) Only
B4remains withr=7 - H) Cannot eliminate any blocks (φ-function prevents DCE)
E) B0, B2, B4 remain (B3 eliminated)
Step 1: Constant Propagation in B0
x = 7(constant)y = 3(constant)
Step 2: Constant Folding in B1
if (x > 5)→if (7 > 5)→TRUE- Branch always takes
B2, never takesB3
Step 3: Mark Unreachable Code
B3is unreachable → mark for deletion- Definition
z = x*2inB3is dead
Step 4: Constant Propagation to B2
z = x + y = 7 + 3 = 10
Step 5: Simplify B4
w = φ(B2: z, B3: z)butB3is unreachablew = φ(10, ⊥) = 10r = w - y = 10 - 3 = 7
Step 6: Dead Code Elimination
B3: unreachable → deleteB1: can be simplified (constant branch) but may remain for structure- However, the question asks which blocks remain, and typically we keep control flow structure
Optimization Sequence:
After constant propagation:
B0: x=7, y=3
B1: if(true) -> B2, B3
B2: z=10
B3: [unreachable]
B4: w=10, r=7
After branch elimination:
B0: x=7, y=3
B2: z=10
B4: w=10, r=7
Remaining Blocks:
B0: Constants definedB2: Computation (always executed)B4: Result computation
Why Other Options Are Wrong:
- A:
B1andB3can be eliminated (constant branch, unreachable code) - B: Wrong branch eliminated (should be
B3, notB2) - C, F, G: Over-aggressive elimination loses control flow structure
- D:
B1can be eliminated with branch folding - H: φ-functions simplify when one predecessor is unreachable
Key Insight: Constant branches enable elimination of unreachable paths. The remaining blocks form the simplified execution path.
After constant propagation, what is the value of result in B6?
B0(a=4,b=6) -> B1
B1(if(a>5)) -> B2,B3
B2(c=a+b, if(c>8)) -> B4,B5
B3(c=a*b) -> B6
B4(d=10) -> B6
B5(d=20) -> B6
B6(d2=φ(B4:d,B5:d,B3:⊥), result=d2+c)
┌─────┐
│ B0 │
│a=4 │
│b=6 │
└──┬──┘
│
┌──▼────┐
│ B1 │
│if(a<5)│
└──┬────┘
│
┌────┴─────┐
│ │
┌──▼────┐ ┌──▼──┐
│ B2 │ │ B3 │
│c=a+b │ │c=a*b│
│if(c>8)│ └──┬──┘
└──┬────┘ │
│ │
┌────┴───┐ │
│ │ │
┌─▼──┐ ┌─▼──┐ │
│ B4 │ │ B5 │ │
│d=10│ │d=20│ │
└─┬──┘ └─┬──┘ │
│ │ │
└───┬───┘ │
│ │
┌──▼────────────┐ │
│ B6 │◄─┘
│d₂=φ(B4:d,B5:d,│
│ B3:⊥) │
│result=d₂+c │
└───────────────┘
flowchart TD
B0["B0<br/>a=4<br/>b=6"] --> B1
B1["B1<br/>if(a<5)"] -->|true| B2
B1 -->|false| B3
B2["B2<br/>c=a+b<br/>if(c>8)"] -->|true| B4
B2 -->|false| B5
B3["B3<br/>c=a*b"] --> B6
B4["B4<br/>d=10"] --> B6
B5["B5<br/>d=20"] --> B6
B6["B6<br/>d₂=φ(B4:d,B5:d,B3:⊥)<br/>result=d₂+c"]
- A)
result = 30 - B)
result = 34 - C)
result = 20 - D)
result = 14 - E)
result = ⊤(overdefined) - F)
result = 28 - G) Cannot determine
- H)
result = 24
C) result = 20
Step 1: Constant Propagation in B0
a = 4(constant)b = 6(constant)
Step 2: Evaluate B1 Branch
if (a < 5)→if (4 < 5)→TRUE- Branch takes
B2,B3is unreachable
Step 3: Constant Propagation to B2
c = a + b = 4 + 6 = 10if (c > 8)→if (10 > 8)→TRUE- Branch takes
B4,B5is unreachable
Step 4: Constant Propagation to B4
d = 10
Step 5: Simplify B6
- Only
B4reachesB6(bothB3andB5are unreachable) d₂ = φ(B4: 10, B5: ⊥, B3: ⊥) = 10cinB6comes fromB2:c = 10result = d₂ + c = 10 + 10 = 20
Execution Trace:
B0: a=4, b=6
↓
B1: if(4<5) → TRUE
↓
B2: c=10, if(10>8) → TRUE
↓
B4: d=10
↓
B6: d₂=10, c=10, result=20
Why Other Options Are Wrong:
- A (30): Would require
d=20andc=10, butB5(whered=20) is unreachable - B (34): Would require
c=24fromB3(wherec=a*b=24), butB3is unreachable - D (14): Incorrect calculation
- E: Not overdefined; the path is fully determinable
- F (28): Would need different values
- G: Fully determinable via constant propagation
- H (24): Would require
c=24fromB3, but that path is unreachable
Key Insight: Nested branches with constant conditions eliminate multiple levels of unreachable code. Track which path is actually taken through the entire nested structure.
After constant propagation and DCE, what remains at B4?
B0(x=2,y=8) -> B1
B1(z=x*y, if(z>10)) -> B2,B3
B2(w=z-5) -> B4
B3(w=z+5) -> B4
B4(w2=φ(w1,w3), r=w2*2)
┌─────┐
│ B0 │
│x=2 │
│y=8 │
└──┬──┘
│
┌──▼─────┐
│ B1 │
│z=x*y │
│if(z>10)│
└──┬─────┘
│
┌────┴────┐
│ │
┌──▼──┐ ┌──▼──┐
│ B2 │ │ B3 │
│w=z-5│ │w=z+5│
└──┬──┘ └──┬──┘
│ │
└────┬───┘
│
┌──▼────────┐
│ B4 │
│w₂=φ(w₁,w₃)│
│r=w₂*2 │
└───────────┘
flowchart TD
B0["B0<br/>x=2<br/>y=8"] --> B1
B1["B1<br/>z=x*y<br/>if(z>10)"] -->|true| B2
B1 -->|false| B3
B2["B2<br/>w=z-5"] --> B4
B3["B3<br/>w=z+5"] --> B4
B4["B4<br/>w₂=φ(w₁,w₃)<br/>r=w₂*2"]
- A)
w₂=φ(11,21); r=w₂*2 - B)
w₂=11; r=22 - C)
w₂=21; r=42 - D)
r=42 - E) All code remains unchanged
- F)
w₂=φ(11,11); r=22 - G)
r=22 - H) Cannot determine
G) r=22
Step 1: Constant Propagation in B0
x = 2(constant)y = 8(constant)
Step 2: Constant Propagation in B1
z = x * y = 2 * 8 = 16if (z > 10)→if (16 > 10)→TRUE- Branch takes
B2,B3is unreachable
Step 3: Constant Propagation to B2
w = z - 5 = 16 - 5 = 11
Step 4: Simplify B4
w₂ = φ(B2: 11, B3: ⊥) = 11(B3 unreachable)r = w₂ * 2 = 11 * 2 = 22
Step 5: Dead Code Elimination
B3is unreachable → deletew₂is only used to computer→ can be inlined- Final simplified:
r = 22
Optimization Sequence:
After constant propagation:
B0: x=2, y=8
B1: z=16, if(true) -> B2, B3
B2: w=11
B3: [unreachable]
B4: w₂=11, r=22
After DCE and simplification:
B0: (constants eliminated)
B1: (branch eliminated)
B2: (intermediate eliminated)
B4: r=22
Why Other Options Are Wrong:
- A: φ-function should be simplified (only one reachable predecessor)
- B:
w₂can be eliminated since it's only used once - C: Wrong path (B3 is unreachable)
- D: Wrong path (B3 would give
w=21, but it's unreachable) - E: DCE removes unreachable code and simplifies
- F: φ-function with single reachable input should be eliminated
- H: Fully determinable via constant propagation
Key Insight: After constant propagation proves a branch is always taken, the unreachable path can be eliminated, and φ-functions collapse to single values that can be further simplified.
After constant propagation, what is result at B4?
B0(a=6,b=3) -> B1
B1(if(a>b)) -> B2,B3
B2(x=a-b, y=10) -> B4
B3(x=b-a, y=20) -> B4
B4(x2=φ(x1,x3), y2=φ(y1,y3), result=x2+y2)
┌─────┐
│ B0 │
│a=6 │
│b=3 │
└──┬──┘
│
┌──▼────┐
│ B1 │
│if(a>b)│
└──┬────┘
│
┌────┴────┐
│ │
┌──▼──┐ ┌──▼──┐
│ B2 │ │ B3 │
│x=a-b│ │x=b-a│
│y=10 │ │y=20 │
└──┬──┘ └──┬──┘
│ │
└────┬───┘
│
┌──▼─────────┐
│ B4 │
│x₂=φ(x₁,x₃) │
│y₂=φ(y₁,y₃) │
│result=x₂+y₂│
└────────────┘
flowchart TD
B0["B0<br/>a=6<br/>b=3"] --> B1
B1["B1<br/>if(a>b)"] -->|true| B2
B1 -->|false| B3
B2["B2<br/>x=a-b<br/>y=10"] --> B4
B3["B3<br/>x=b-a<br/>y=20"] --> B4
B4["B4<br/>x₂=φ(x₁,x₃)<br/>y₂=φ(y₁,y₃)<br/>result=x₂+y₂"]
- A)
result = 13 - B)
result = 23 - C)
result = 17 - D)
result = -17 - E)
result = ⊤(overdefined) - F)
result = 30 - G)
result = 10 - H)
result = 7
A) result = 13
Step 1: Constant Propagation in B0
a = 6(constant)b = 3(constant)
Step 2: Evaluate B1 Branch
if (a > b)→if (6 > 3)→TRUE- Branch takes
B2,B3is unreachable
Step 3: Constant Propagation to B2
x = a - b = 6 - 3 = 3y = 10
Step 4: Simplify B4
x₂ = φ(B2: 3, B3: ⊥) = 3(B3 unreachable)y₂ = φ(B2: 10, B3: ⊥) = 10(B3 unreachable)result = x₂ + y₂ = 3 + 10 = 13
Execution Trace:
B0: a=6, b=3
↓
B1: if(6>3) → TRUE
↓
B2: x=3, y=10
↓
B4: x₂=3, y₂=10, result=13
Verification with Alternative Path (unreachable):
If B3 were reachable:
x = b - a = 3 - 6 = -3y = 20result = -3 + 20 = 17(but this path doesn't execute)
Why Other Options Are Wrong:
- B (23): Would require both branches to be reachable with different logic
- C (17): This would be the result from
B3path, butB3is unreachable - D (-17): Incorrect calculation
- E: Not overdefined; single path is determinable
- F (30): Would require
x=10, y=20, which doesn't match either path - G (10): Would require
x=0, which doesn't occur - H (7): Would require different values
Key Insight: When multiple φ-functions exist at the same merge point, all must be simplified consistently based on which predecessors are reachable. All φ-functions collapse to single values when only one path is taken.
-
Dominance Analysis
- Computing dominance frontiers
- Iterated dominance frontiers for φ-placement
- Post-dominators and control dependence
-
SSA Form
- φ-function placement at merge points
- Minimal SSA construction
- φ-function simplification
-
Loop Optimizations
- Loop Invariant Code Motion (LICM)
- Strength Reduction
- Exception safety in transformations
-
Constant Optimizations
- Constant folding
- Constant propagation
- Branch elimination
- Dead code elimination (DCE)
-
Value Analysis
- Value range propagation
- Reachability analysis
Pattern 1: Constant Branch Elimination
if (constant_expr) → evaluate → eliminate unreachable path
Pattern 2: φ-Function Simplification
φ(v₁, v₂, ..., vₙ) with only one reachable predecessor → single value
Pattern 3: Exception Safety Check
Can this optimization introduce a NEW exception? → If yes, ILLEGAL
Pattern 4: Profitability Analysis
Benefits (cycles saved) > Costs (register pressure, memory) → PROFITABLE
- Practice Drawing CFGs: Use whichever notation is fastest for you
- Trace Execution Paths: Follow constants through the program
- Check Edge Cases: Zero-iteration loops, unreachable code
- Verify Optimizations: Does it preserve semantics?
- Consider Costs: Not all optimizations are profitable
- Course lecture notes on SSA construction
- Dragon Book: Chapter 9 (Machine-Independent Optimizations)
- "SSA is Functional Programming" by Appel
- Modern Compiler Implementation textbooks
End of Report