Skip to content

Instantly share code, notes, and snippets.

@MoserMichael
Last active January 21, 2026 04:17
Show Gist options
  • Select an option

  • Save MoserMichael/b18fdb55964ab5bfe1398f56361b309a to your computer and use it in GitHub Desktop.

Select an option

Save MoserMichael/b18fdb55964ab5bfe1398f56361b309a to your computer and use it in GitHub Desktop.
compsci_likbez

learning queuing theory basics

Bikey Bonn Kleiford Seranilla - Queuing Theory

Liz Thompson - Queueing theory (simple)

SREcon24 Americas - System Performance and Queuing Theory - Concepts and Application

The field was established in 1909 by Agner Krarup Erlang, as he was analyzing switched telephone networks. (the programming language Erlang is named after him)

System := Arriving task (kept in RX queue) -> Server (request processing here) -> Departing task (kept in TX queue)

in basic queueing theory there is queue of incoming requests (like RX), but no queue of outgoing requests (TX). why?

says sending/TX is not the primary bottleneck, and is often less problematic then the receiving end/RX. RX is buffering incoming requests, so it can't be separated away from servicing the request, so it is more critical. However for networking systems you can't do that, the TX buffer must be considered...

Basic queueing theory was developed for switched telephone networks, so TX seems to be very important in this case, isn't it?

TX is seen as part of the service.

--

$\lambda$ - rate of arrival of task into system (requests received per second)

$\mu$ - rate of task completion (that's requests completed per second / other relevant time unit)

Now is $\lambda$ < $\mu$ then the system is unstable. If the rate that tasks are completed is smaller than the rate task enter the system, the the queue will grow and grow - and that's impossible.

R - Average time the request is in the system. R = W + S - W - average wait time (in input queue) - S - average service time - Important detail: 1 / average-service-time = average-service-rate . Example: 10ms time to service a request: 1000 / 10 = 100 requests per second. This means it should be number-of-time-unit / average-service-time. Why is it still the inverse? math likes floating point numbers so the observed time is thought to be a fraction of 1. - the inverse of the average service time is $\mu$ (don't forget the number of time units in the divided, in practical terms)

Little's law

Average-Queue-Length = Average-Arrival-Rate * Average-time-request-is-in-the-system

Q = $\lambda$ * R


Capacity utilization - for what percentage of the time is the system busy?

$\rho$ = $\lambda$ / $\mu$

Example: 100 requests arrive per seconds / 120 requests are completed = 0.83 probability that the system is busy.

Since percentage is probability times 100 - this means the system is running for 83% of the time.

$\rho = \lambda * S$

because S is the inverse of $\mu$ - see previous section.


Kendal notation to characterize a queueing system

(a / b / c) : (d / e / f)

  • a: input distribution (distributions by which requests arrive)

    • M : poisson arrival distribution (this assumes a constant rate of $\lambda$)
    • D : deterministic inter arrival distribution - here events arrive at perfectly fixed, regular intervals, with zero randomness
    • Ex : Erlangian / Gamma inter arrival distribution - here arrivals are not completely independent or have a memory component
    • GI : General distribution (like Gaussian Z-distribution with a shape parameter)
  • b: output distribution (distributions by which requests are completed)

  • c: number of servers

  • d: service discipline (first-come-first-server, etc)

  • e: maximum number of requests allowed in the system (finite or infinite-meaning-not-limited)

  • f: calling source (total number of potential customers) - (finite or infinite-meaning-not-limited)

Now for each different Kendal notation there are a set of formulas for determining

  • $L_{s}$ : average number of requests in the system
  • $L_{s}$ : residence time
  • $L_{q}$ : average number of requests in the queue
  • $W_{q}$ : average time a request spends in the queue

The most common class of system is (engineers often assume a constant rate of $\lambda$ for simplicity) (M/M/1) They actually like to talk about (M/M/1):(FIRST-COME-FIRST-SERVE, $\infty$ , $\infty$ ) - the problem of bound queue size and limited is abstracted away.

$L_{s} = \displaystyle \frac{\lambda}{\mu - \lambda}$

$W_{s} = \displaystyle \frac{1}{\mu - \lambda}$

$L_{q} = \displaystyle \frac{\lambda^{2}}{\mu(\mu - \lambda)}$

$W_{q} = \displaystyle \frac{\lambda}{\mu(\mu - \lambda)}$

--

$\rho = \lambda / \mu$

$P_{0} = \displaystyle 1 - \frac{\lambda}{\mu}$ - probability of idle system (no requests queued or in server)

$P_{n&gt;k} = \displaystyle 1 - (\frac{\lambda}{\mu})^{k+1}$ - probability of more than k requests queued or in server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment