sanketsudake/SFQ.md

## SFQ.md

      
    Raw
  

              SFQ.md
            
          
    Stochastic Fairness Queuing (SFQ) is a packet scheduling algorithm designed to ensure fair access to network resources or application backends. Instead of allocating a dedicated queue for every single flow (which is memory-intensive and doesn't scale), SFQ hashes the flow identifier (like a user ID, IP address, or tenant ID) to assign it to one of a fixed number of queues.
While SFQ prevents a single high-volume flow from monopolizing the entire system, hash collisions can cause an innocent flow to share a queue with a "noisy neighbor." To mitigate this, advanced load balancing strategies like Shuffle Sharding and the Best of Two (Power of Two Choices) are often layered on top.
Here is how you can implement basic SFQ and its enhanced variants in Go. We will focus on the dispatcher logic—the part of the system responsible for routing an incoming request to the appropriate queue.
Setup: Basic Types and Hashing

First, let's define our basic data structures and some helper hash functions that all three implementations will use.
package main

import (
	"fmt"
	"hash/crc32"
	"hash/fnv"
	"math/rand"
)

type Request struct {
	FlowID string // Represents the tenant, user, or IP
	Data   string
}

// A simple queue representation
type Queue struct {
	Items chan Request
}

func NewQueue(capacity int) *Queue {
	return &Queue{
		Items: make(chan Request, capacity),
	}
}

// Helper: Hash function 1 (CRC32)
func hash1(s string) uint32 {
	return crc32.ChecksumIEEE([]byte(s))
}

// Helper: Hash function 2 (FNV-1a)
func hash2(s string) uint32 {
	h := fnv.New32a()
	h.Write([]byte(s))
	return h.Sum32()
}

1. Basic Stochastic Fairness Queuing (SFQ)

In basic SFQ, we use a single hash function to map a flow directly to one specific queue. If a queue is full, the request is dropped (or returns a 429 Too Many Requests).
type SFQDispatcher struct {
	Queues []*Queue
}

func (d *SFQDispatcher) Dispatch(req Request) error {
	// Hash the FlowID to find the target queue index
	queueIndex := hash1(req.FlowID) % uint32(len(d.Queues))
	targetQueue := d.Queues[queueIndex]

	select {
	case targetQueue.Items <- req:
		fmt.Printf("SFQ: Flow %s routed to Queue %d\n", req.FlowID, queueIndex)
		return nil
	default:
		return fmt.Errorf("SFQ: Queue %d is full (dropped %s)", queueIndex, req.FlowID)
	}
}
Pros: Extremely fast, simple to implement, ensures strict ordering for a specific flow.
Cons: Highly susceptible to hash collisions. If Flow A (noisy) hashes to the same queue as Flow B (quiet), Flow B suffers.

2. Shuffle Sharding

Shuffle sharding isolates flows by assigning them to a specific subset (shard) of queues rather than a single queue. For example, if you have 8 queues and a shard size of 2, Flow A might get queues [1, 5], while Flow B gets [1, 3]. Even if they collide on queue 1, Flow B still has queue 3 to fall back on.
To implement this, we use the FlowID to deterministically pick $K$ unique queues, and then route the request to the shortest queue within that shard.
type ShuffleShardDispatcher struct {
	Queues    []*Queue
	ShardSize int
}

func (d *ShuffleShardDispatcher) getShard(flowID string) []int {
	// Seed a random number generator with the hash of the FlowID
	// This ensures the same flow always gets the same shard of queues.
	seed := int64(hash1(flowID))
	rng := rand.New(rand.NewSource(seed))

	// Generate a random permutation of all queue indices
	perm := rng.Perm(len(d.Queues))
	
	// Return the first 'ShardSize' elements as this flow's shard
	return perm[:d.ShardSize]
}

func (d *ShuffleShardDispatcher) Dispatch(req Request) error {
	shard := d.getShard(req.FlowID)

	// Find the queue with the least amount of items in this shard
	bestIdx := shard[0]
	minLen := len(d.Queues[bestIdx].Items)

	for _, qIdx := range shard[1:] {
		qLen := len(d.Queues[qIdx].Items)
		if qLen < minLen {
			minLen = qLen
			bestIdx = qIdx
		}
	}

	targetQueue := d.Queues[bestIdx]

	select {
	case targetQueue.Items <- req:
		fmt.Printf("ShuffleSharding: Flow %s routed to Queue %d\n", req.FlowID, bestIdx)
		return nil
	default:
		return fmt.Errorf("ShuffleSharding: Shard full for %s", req.FlowID)
	}
}
Pros: Drastically limits the "blast radius" of a noisy neighbor. Order of magnitude better fairness than basic SFQ.
Cons: You lose strict FIFO ordering for a single flow, as requests for the same flow might end up in different queues within the shard.

3. Best of Two (Power of Two Choices)

Instead of assigning a flow to a persistent shard, "Best of Two" dynamically picks two completely random queues (or two deterministic queues using two different hash functions) for every single request and pushes the request to the shorter one.
In a fairness queuing context, we usually use two distinct hash functions on the FlowID to find two deterministic candidate queues for that flow.
type BestOfTwoDispatcher struct {
	Queues []*Queue
}

func (d *BestOfTwoDispatcher) Dispatch(req Request) error {
	// Pick two queues using two different hash functions
	idx1 := hash1(req.FlowID) % uint32(len(d.Queues))
	idx2 := hash2(req.FlowID) % uint32(len(d.Queues))

	// Compare queue lengths to find the shortest
	bestIdx := idx1
	if len(d.Queues[idx2].Items) < len(d.Queues[idx1].Items) {
		bestIdx = idx2
	}

	targetQueue := d.Queues[bestIdx]

	select {
	case targetQueue.Items <- req:
		fmt.Printf("BestOfTwo: Flow %s routed to Queue %d\n", req.FlowID, bestIdx)
		return nil
	default:
		return fmt.Errorf("BestOfTwo: Both queues full for %s", req.FlowID)
	}
}
Pros: Extremely effective at keeping queue lengths balanced across the entire system. Mathematically proven to exponentially reduce maximum queue lengths compared to picking a single random queue.
Cons: Like shuffle sharding, strict ordering per flow is lost.

Comparison Summary


Strategy
Implementation Complexity
Flow Isolation
Maintains Strict Ordering?
Best Use Case


Basic SFQ
Low
Poor (Collisions hurt)
Yes
Legacy network packet scheduling where order is critical.


Shuffle Sharding
Medium
Excellent
No
Multi-tenant SaaS architectures where blast-radius containment is the top priority.


Best of Two
Low
Good
No
High-throughput microservices load balancing where balancing queue depth is the top priority.


References


https://courses.cs.duke.edu/spring09/cps214/papers/sfq.pdf
https://brooker.co.za/blog/2026/02/25/sfq.html
Strategy	Implementation Complexity	Flow Isolation	Maintains Strict Ordering?	Best Use Case
Basic SFQ	Low	Poor (Collisions hurt)	Yes	Legacy network packet scheduling where order is critical.
Shuffle Sharding	Medium	Excellent	No	Multi-tenant SaaS architectures where blast-radius containment is the top priority.
Best of Two	Low	Good	No	High-throughput microservices load balancing where balancing queue depth is the top priority.
No results found