Skip to content

Instantly share code, notes, and snippets.

@ispradeep
Created December 2, 2025 18:26
Show Gist options
  • Select an option

  • Save ispradeep/7b6945a9ff29a338bfd0d10eee5a3be7 to your computer and use it in GitHub Desktop.

Select an option

Save ispradeep/7b6945a9ff29a338bfd0d10eee5a3be7 to your computer and use it in GitHub Desktop.
coroutine

but still one thing isn't clear for me that how does async works in c without coroutines . like even if i use thread-pool with io_uring() so how will i ensure that if my req is reading file so user that time to serve diff req and when files data come then start prev programm.

is it possible that we can use a cache type something that when a func runs it checks state and run from there ? so how ? . just clear my confusion ChatGPT said:

You're asking the core question of how async actually works in C WITHOUT coroutines, and this is an excellent question because it reveals how low-level async really is.

Here’s the clear mental model 👇 Async in C does NOT magically pause & resume your function (like Python await, JS promises, Rust async/await, C++ coroutines).

There is no built-in state machine. YOU (or a library) must build that state machine.

Let’s break it down clearly.

✅ 1. C does async with callbacks + state machines, not coroutines

In higher-level languages, await suspends a function. In C, this is impossible without compiler help.

So instead, async systems (libuv, io_uring, epoll) work like this:

Register a request (read file, accept connection, read socket)

Give the kernel a callback or data pointer

Return immediately

Kernel finishes the task in background

Kernel tells you: “Done! Here is your result”

Your callback/state machine continues the work

Example pseudo-process:

submit read(file_fd, buffer) return immediately

// Meanwhile, event loop handles other reqs

kernel finishes read => puts completion in queue

event loop picks completion => calls your callback or continues your state machine

That's async in C.

✅ 2. So how do you "continue" a function later?

You DO NOT resume a function. You DO NOT pause a stack frame.

Instead, YOU store the state manually.

This is the heart of async in plain C. You store:

which request this belongs to

current step

buffers

what to do next

Example:

typedef struct { int step; int client_fd; int file_fd; char buffer[4096]; } RequestState;

Your async code becomes:

void handle_step(RequestState *state) { switch(state->step) { case 0: // step 0: open file and read io_uring_submit_read(..., state); state->step = 1; return;

    case 1:
        // step 1: read complete, now parse & send
        parse_file(state->buffer);
        io_uring_submit_write(..., state);
        state->step = 2;
        return;

    case 2:
        // step 2: write complete, close
        close(state->client_fd);
        free(state);
        return;
}

}

This is EXACTLY how Node.js/libuv works internally.

This is EXACTLY how io_uring servers work.

This is EXACTLY how nginx works (event loop + state machine).

THIS is async.

✅ 3. io_uring still requires this state machine

Even though io_uring is the fastest async API ever created, it does NOT resume functions for you.

You must attach context:

struct io_uring_sqe *sqe = io_uring_get_sqe(ring); io_uring_prep_read(sqe, fd, buf, size, offset); io_uring_sqe_set_data(sqe, state); // store your state machine here

Later:

struct io_uring_cqe *cqe; io_uring_wait_cqe(ring, &cqe);

RequestState *state = io_uring_cqe_get_data(cqe); handle_step(state);

That's how async works. You pass a pointer to your “state”, and you handle continuation yourself.

✅ 4. ⚠️ Why C does NOT have coroutines natively?

Because:

C is too low-level

C values predictable ABI + simple execution model

Coroutines require compiler transformations

It breaks compatibility with existing C code

Languages with async/await rewrite your function into a generated state machine.

C requires YOU to write it manually.

✅ 5. Is it possible to implement coroutine-like behavior in plain C?

YES — but YOU must simulate it:

Option A — use a state machine (what everyone does)

Manual, ugly, reliable.

Option B — use fibers (user-level threads)

Stackful coroutines implemented with:

setjmp / longjmp

ucontext (deprecated but used)

custom fiber libraries

Option C — use a coroutine library

Examples:

libaco

libco

libmill

libdill

Coro

These implement coroutines on top of C.

Option D — use C++20 coroutines but write C ABI wrappers

(only if mixing C with C++)

🔥 FINAL SUMMARY — ONLY THREE SENTENCES

➡️ Async in C does not pause a function — instead, you break your function into states. ➡️ You store those states in a struct and continue them when io_uring/epoll tells you the operation finished. ➡️ This is the core of all async C frameworks: manual state machines + event loop.

If you want

I can show:

a real io_uring HTTP server "state machine" example

how NGINX does its async state logic

how to convert your current blocking C server into a non-blocking state machine

how to build coroutine-like behavior in pure C

Just tell me!

You said: but io_uring() is kernel linux api to talk , prev was epol() ChatGPT said:

Yes — io_uring is a Linux kernel API just like epoll, but the way it works is very different. Let me clear your confusion:

✅ epoll vs io_uring — what they actually do 🔵 epoll (old model)

Epoll can only tell you:

“this FD is ready to read”

“this FD is ready to write”

But epoll does NOT perform I/O itself. You still need to call:

read(fd, buf, size); write(fd, buf, size);

These syscalls block the kernel thread (even if small), and you still need a manual async state machine.

🔴 io_uring (new model)

io_uring allows you to tell the kernel:

“Please perform this I/O operation for me and notify me when done.”

The kernel performs:

read

write

accept

connect

open

fsync

send/recv

sendfile

statx

etc.

WITHOUT your thread being blocked. No syscalls per operation. You submit work → kernel does it → kernel notifies you.

This is a major difference.

🌟 BUT IMPORTANT POINT (your real confusion):

Even though io_uring is advanced…

👉 it still does NOT resume your functions 👉 it still does NOT contain coroutines 👉 you still MUST maintain your own state machine

Because Linux kernel does not know anything about your program structure.

It only does:

async file read

async network operations

async open/write

And returns completion events.

You STILL must decide:

what to do next

what state the request was in

what part of the logic to resume

That's why EVERY async C framework has a state machine.

✅ Why can’t kernel “pause my function and resume it later”?

Because:

C functions have stack frames

Stack memory lives in user space

Linux kernel is not allowed to manipulate user program stack frames

Kernel cannot store local variables of your function

It cannot resume a stack pointer or instruction pointer safely

That would require:

compiler cooperation

new ABI

breaking 40 years of binary compatibility

This is why coroutines cannot magically exist inside the kernel.

🧠 So how does async REALLY work in C with io_uring?

You do this:

Allocate a context (state struct)

Submit async operation with io_uring_prep_*

Attach your context using io_uring_sqe_set_data()

Return to event loop

When kernel finishes, your completion queue gives back the same context

You continue your logic manually based on stored state

This is EXACTLY how:

nginx

redis

node.js/libuv

haproxy

seastar

high-performance C servers

ALL implement async.

🔥 Short answer: io_uring is kernel async I/O

But async logic (resuming functions) is YOUR responsibility, because C has no built-in coroutines.

🧩 If you want I can show you:

a real example where io_uring processes an HTTP request step-by-step

how to write a fully async state machine in C

how to simulate coroutines in C using macros

how Rust/C++/Go implement async internally vs C

Just say the word.

You said: so we can use a strcuct to store our fnc state before serving another req and when data will come through kernel it will call that fnc again and we will resume fnc where we stored??? . will it work like this or what??

ChatGPT said:

You got 90% of it correct, and now I’ll help you fully lock in the final 10%.

Here is the exact truth in the simplest possible words:

✅ YES — you store your function state in a struct.

That struct contains:

what step you're in

buffers

file descriptors

offsets

partial results

whatever you need to continue later

Example:

typedef struct { int step; int client_fd; int file_fd; char buffer[4096]; size_t bytes_read; } RequestState;

Exactly like you said.

❌ BUT — the kernel does NOT call your function again.

This is the only misunderstanding.

Linux kernel never re-enters your user function.

Instead:

✔️ Your event loop (your code) receives the completion ✔️ You look up the state struct ✔️ YOU call the "next step" function yourself

The kernel just gives you this:

Operation #123 is done. Here is the result.

It does not run your function.

🧠 So the flow is actually like this: 🔹 Step 1 — You begin the request state->step = 0; submit_read(..., state);

(return from function now — async)

🔹 Step 2 — Kernel finishes read, puts an event in CQ

Your event loop picks it:

state = io_uring_cqe_get_data(cqe); handle_next_step(state);

Again: Your event loop calls handle_next_step, not the kernel.

🔹 Step 3 — Continue based on state->step void handle_next_step(RequestState *state) { if (state->step == 0) { // file read done → now parse or write response state->step = 1; submit_write(..., state); } else if (state->step == 1) { // write done → cleanup close(state->client_fd); free(state); } }

This is EXACTLY how async works in C and Linux.

🔥 Think of it like this:

C cannot pause a function. But YOU can split your function into multiple pieces (states) and chain them.

This is the classic async design:

❌ Wrong mental model

“Kernel pauses and resumes my function.”

✔️ Correct mental model

“I break my function into steps and run each step when kernel says I/O is done.”

🚨 This is EXACTLY how Node.js, Nginx, Redis, libuv, Seastar do async

All of them:

create a struct for state

store progress

submit async operations

return

later, continuation function (“step handler”) runs

They just hide this from you behind frameworks.

You are now learning real async.

🧩 Small analogy

You’re writing a story, but:

You can only write one chapter at a time

When you wait for I/O, you save the chapter number

Later, you continue writing from that chapter

That’s async without coroutines.

✔️ FINAL ANSWER (super short)

Yes. You store your state in a struct. But the kernel NEVER calls your function. Your event loop calls the next part of your function when I/O completes.

If you want I can give you a full real-world io_uring async server example showing step 1 → step 2 → step 3 with complete code.

You said: maybe compiler with coroutines also does similar things like thing but at compiler level. ChatGPT said:

YES — EXACTLY. You finally reached the correct mental model.

Let me explain it cleanly:

✅ What modern compilers do for coroutines (C++, Rust, Swift, C#, etc.)

When you write this high-level code:

co_await read_file(); co_await write_socket();

It looks like a normal function that “pauses” and continues later.

But that is an illusion created by the compiler.

Under the hood, the compiler:

  1. Breaks your function into states

Just like you would do manually in C:

switch(state) { case 0: ...
case 1: ... case 2: ... }

  1. Allocates a struct to hold all your variables

Just like you would do manually:

struct CoroutineFrame { int state; int fd; char buffer[4096]; size_t offset; ... all locals and temps ... };

  1. Stores your local variables in the “frame object”

Because the stack cannot be paused.

  1. Generates code to resume from the correct state

Same as your manual async state machine.

  1. Registers a callback or continuation

So that when I/O finishes, the runtime knows which coroutine frame to resume.

THIS IS EXACTLY WHAT YOU ARE DOING MANUALLY IN C.

The difference?

👉 Compiler automatically generates the state machine 👉 You don’t see it 👉 Your code looks like straight-line synchronous code

🔥 In plain C, YOU must do this manually.

This is the difference:

Modern async languages: compiler builds state machine for you compiler stores state in a hidden struct compiler rewrites your function into many pieces compiler ensures continuation works

C (no coroutines): you build the state machine you store state in your own struct you manually resume the logic later

🧩 Example:

Your async C code (manual):

typedef struct { int step; int fd; char buf[4096]; } Request;

void handle(Request *req) { switch(req->step) { case 0: submit_read(req); req->step = 1; return; case 1: process(req->buf); submit_write(req); req->step = 2; return; case 2: cleanup(req); return; } }

🟢 Compiler-generated coroutine (C++20)

Your code:

co_await async_read(fd, buf); process(buf); co_await async_write(fd, buf);

Compiler turns it into something very similar to:

switch(state) { case 0: async_read(fd, buf, this); state = 1; return; case 1: process(buf); async_write(fd, buf, this); state = 2; return; case 2: return; }

YES — this is literally what the compiler does.

🎯 CONCLUSION (the important point):

👉 Async in C (manual) 👉 Async in C++/Rust/JS/C#/Go (automatic)

They BOTH use the same underlying ideas:

state machines

continuation passing

heap-allocated frames

event loops

callbacks from kernel (io_uring/epoll)

The only difference:

💚 Languages with coroutines:

Compiler writes the state machine for you.

❤️ C:

You write the state machine yourself.

If you want, I can show you:

a real C++ coroutine disassembly

how Rust rewrites async functions into state machines

how Node.js event loop is implemented

how to build a tiny coroutine system in C

Just tell me!

You said: so compiler doesnt do any magic , it's just storing state in mem either ourselves or compiler does for us. earlier i thought there is something else and maybe assembly code is needed for this. ChatGPT said:

Exactly — there is zero magic. You finally get the real truth that many people misunderstand.

Let’s break it down cleanly:

✅ 1. Coroutines DO NOT need special CPU instructions

A coroutine is NOT:

a thread

a syscall

a special CPU mode

a kernel feature

or anything in hardware

It’s pure code transformation done by the compiler.

✅ 2. What the compiler actually does

When you write:

co_await do_io();

The compiler rewrites your function into:

a struct holding all the local variables

an integer state (like step)

a switch-case resume point

a manual continuation callback

This is the same thing you would do in C manually.

NO special assembly instructions are required. NO kernel support is required.

It's literally just normal machine instructions.

🧠 3. Coroutines are just: ✔️ A struct for state ✔️ A function turned into a state machine ✔️ A scheduler that calls “resume()” when I/O is complete

Here is what the compiler-generated struct looks like conceptually:

struct CoroutineFrame { int state; int fd; char buffer[4096]; size_t offset; ...temp variables... };

Exactly like your manual struct in C.

🔧 4. The compiler-generated state machine is basically this: switch (frame->state) { case 0: async_read(..., frame); frame->state = 1; return; // suspend

case 1:
    process(frame->buffer);
    async_write(..., frame);
    frame->state = 2;
    return;  // suspend

case 2:
    cleanup(frame);
    return;  // done

}

That’s it. This is the heart of coroutines.

You do this manually in C. The compiler does it automatically in C++/Rust/etc.

🔥 5. The illusion of magic

Languages like Rust / C++ / C# / Python hide this:

await read_file()

But under the hood, it becomes:

state machine

callbacks

continuation-passing

jump tables

All done for you.

🟢 6. You do NOT need special assembly to pause/resume

Why?

Because:

stack cannot be paused

coroutine stack is stored in a heap struct (“frame”)

resuming just means calling the function with the frame pointer and jumping to the right case

This is regular code, no magic.

⭐ 7. If you want actual stack-switching (like Go goroutines)

THAT does require:

custom assembly for switching stack pointers

manually adjusting registers

But that’s not what C++/Rust async does.

They use stackless coroutines, which use state machines on the heap.

🧠 Final Knowledge Summary ✔️ Coroutines = compiler-generated state machine ✔️ Same as manual struct + switch-case async logic in C ✔️ No special CPU magic ✔️ No special syscalls ✔️ No assembly (only if you want stack-switching coroutines like Go)

You understood it perfectly.

If you want to go deeper

I can show you:

actual generated assembly for a coroutine (very cool)

how Rust transforms async functions into state machines

difference between stackful vs stackless coroutines

how to implement your own coroutines in C in 20 lines (using macros)

Just tell me!

You said:

7. If you want actual stack-switching (like Go goroutines) ================================================================= THAT does require: * custom assembly for switching stack pointers * manually adjusting registers But that’s not what C++/Rust async does. They use stackless coroutines, which use state machines on the heap.

can u tell me what does this means ChatGPT said:

Great — let me explain this very clearly, because this is a BIG concept people confuse.

There are two kinds of coroutines in the world:

🟩 1. Stackless Coroutines (C++, Rust async/await, JS, Python async)

These coroutines:

do NOT have their own stack

cannot pause at arbitrary places

only pause at await/co_await points

compiler transforms them into a state machine

all state is stored in a heap struct

no assembly needed

super cheap, lightweight

These are the "simple" ones.

🟥 2. Stackful Coroutines (Go goroutines, Lua coroutines, C fibers, Boost.Context)

These coroutines:

have their own independent stack

can pause ANYWHERE, even inside deep nested function calls

need the runtime to switch stacks

require manipulating CPU registers + stack pointers

usually implemented in assembly or very low-level C

These are the "full power" coroutine systems.

🟢 Difference explained simply Stackless: compiler breaks your function into states await A(); await B(); await C();

Compiler turns this into:

state 0 → do A state 1 → do B state 2 → do C

Nothing fancy. Just switching based on a number inside a struct.

Stackful: runtime swaps the ENTIRE call stack

In stackful coroutines, each coroutine has:

its own stack memory

its own CPU register context

Switching coroutines means:

❗ CPU SP (stack pointer) changes ❗ CPU registers restored ❗ Code resumes at exact instruction

This requires:

✔️ Assembly instructions like mov, jmp, saving registers ✔️ Changing the stack pointer (rsp on x86-64)

This is exactly what Go does internally.

🧠 Why stack switching needs assembly?

Because only assembly can manipulate:

stack pointer (SP/RSP)

instruction pointer (IP/RIP)

callee-saved registers

general registers

Example conceptually (simplified):

save current SP into coroutine1.sp save registers into coroutine1.regs

load coroutine2.sp into SP load registers from coroutine2.regs

jmp coroutine2.next_instruction

This cannot be done from pure C.

🟦 Summary: What your quoted text means ⭐ Stack-switching coroutines (like Go goroutines)

Need:

custom assembly

saving/restoring CPU registers

switching the stack pointer

moving execution to a new stack frame

These are powerful but complex.

⭐ Stackless coroutines (like C++/Rust async)

Do NOT switch stacks.

Instead:

store variables in a heap struct

store a “state number”

compiler generates a big switch-case

event loop calls the coroutine again

resume happens by code, not by stack switching

🎯 The idea in one sentence

Stackless coroutines = compiler-generated state machine (no assembly). Stackful coroutines = runtime swaps the whole stack (requires assembly).

If you want, I can show:

✅ how Go implements goroutines internally ✅ how Lua implements coroutines ✅ how to write your own stackful coroutine system in C ✅ how C++ translates a simple coroutine into a state machine

Just ask!

You said: which is best or much faster or whatever?? ChatGPT said:

Here's the real answer — no bullshit, no hype.

There is no “best” coroutine model universally. But each model is best for specific kinds of workloads.

Let’s break it down by:

performance

flexibility

memory usage

complexity

use-cases

This will make everything crystal-clear.

🟩 1. STACKLESS COROUTINES

(C++, Rust async, JS await, Python async, C# async)

✔️ FASTEST for I/O-bound work

Because:

they do not switch stacks

they do not switch CPU registers

they do not manipulate SP/RIP

they generate simple state machines

extremely small overhead

Exactly same performance as manually writing a state machine in C.

🔥 BEST for:

async networking

handling 1M+ concurrent connections

HTTP servers

database clients

Redis-like event loops

io_uring-based servers

Node.js-style workloads

⚡ Speed characteristics:

extremely cheap to suspend/resume

almost zero memory per coroutine (just a small heap frame)

can scale to millions

❌ NOT good for:

deep recursion

CPU-heavy tasks

pausing anywhere in a function (only at await points)

🟥 2. STACKFUL COROUTINES

(Go goroutines, Lua coroutines, Boost.Context, custom fibers)

✔️ FASTEST for CPU-bound work or complex call stacks

Because:

they can pause ANYWHERE

they keep a separate call stack

no need for compiler transformations

excellent for large, complex flows

🔥 BEST for:

game engines

simulations

AI loop scheduling

scripting systems (Lua)

deeply recursive code

applications with many nested function calls

code that benefits from “real” pausing

⚠️ BUT speed cost:

must save/restore CPU registers

must switch stacks (SP pointer)

must adjust instruction pointer

requires assembly or OS tricks

switching costs ~50–200 ns

uses 2 KB–8 MB per goroutine/fiber stack

So they scale to 100k–300k coroutines, not millions.

❌ NOT good for:

extremely high concurrency (millions)

tiny memory environments

ultra-low-latency servers with millions of requests

🏆 Which is “fastest”?

Depends on what "fast" means:

Task Fastest Model Why High-concurrency network server Stackless ultra-cheap, tiny memory, good for millions of tasks Deep call stack coroutines Stackful can pause anywhere, not restricted to await CPU-heavy logic Stackful better for complex logic Async I/O (files/sockets) Stackless matches event-driven model Game engines Stackful pausing anywhere is needed Node.js-like apps Stackless state machines fit perfectly Go-like developer experience Stackful real goroutines Manual C async with io_uring Stackless essentially state machines 🏅 Overall Winner by Category 🟨 For Networking → Stackless

(nginx, Rust async, C++ coroutines, Node.js, libuv, io_uring)

🟧 For Game Engines / Simulation → Stackful

(Godot fibers, Lua coroutines, custom C fibers)

🟩 For HPC / Systems Programming → Stackless or threads

(io_uring + state machines = fastest possible server)

🟥 For “easy to use” developer experience → Stackful

(Go goroutines feel magical)

🎯 Which is faster for REAL servers? io_uring + stackless coroutines == fastest known method today

This is what the fastest C/C++/Rust servers use.

Because:

zero syscalls per operation

no stack switching

pure state-machine + event loop

can handle 1–5 million connections

This is what modern databases, proxies, and load balancers are adopting.

🧠 Final Summary 🔥 Stackless = fastest for I/O concurrency

→ use for servers

🔥 Stackful = fastest for complex logic / CPU flows

→ use for simulations, games, scripting

❌ Neither is universally “best”

They solve different problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment