Skip to content

Instantly share code, notes, and snippets.

@huytd
Last active February 24, 2026 20:41
Show Gist options
  • Select an option

  • Save huytd/43c9826d269b59887eab3e05a7bcb99c to your computer and use it in GitHub Desktop.

Select an option

Save huytd/43c9826d269b59887eab3e05a7bcb99c to your computer and use it in GitHub Desktop.
Qwen3.5-35B-A3B Claude Code Test
❯ let's implement the plan in @docs/plans/
⎿  Listed directory docs/plans/
∴ Thinking…
User wants to implement a plan from docs/plans, so I should invoke the brainstorming skill first per the superpowers protocol to explore
requirements before implementation.
⏺ Skill(superpowers:brainstorming)
⎿  Successfully loaded skill
∴ Thinking…
Let me read the plan files that were listed earlier to understand what needs to be implemented.
⏺ Read(docs/plans/2026-02-24-notion-to-markdown-implementation.md)
⎿  Read 1242 lines
⏺ Read(docs/plans/2026-02-24-notion-to-markdown-app-design.md)
⎿  Read 173 lines
∴ Thinking…
I need to read the second plan file (design doc) to understand the full context before proceeding with implementation.
⏺ Now I have both plan files. The user wants to implement the Notion to Markdown converter plan. I should invoke the superpowers:executing-plans
skill as specified in the implementation plan document.
⏺ Skill(superpowers:executing-plans)
⎿  Successfully loaded skill
∴ Thinking…
I've invoked the executing-plans skill which will guide me through the implementation process. The skill has loaded and now I should follow its
workflow: review the plan critically, create tasks, execute the first batch of 3 tasks, and report for review. Let me proceed with the plan review
and task creation.
⏺ I'm using the executing-plans skill to implement this plan.
I've reviewed the implementation plan for the Notion to Markdown Converter. The plan is well-structured with 12 tasks following TDD approach using
only Go standard library.
Before starting, I need to set up an isolated workspace. Let me create a git worktree for this implementation:
⏺ Skill(superpowers:using-git-worktrees)
⎿  Successfully loaded skill
∴ Thinking…
The user asked me to implement the plan, and I've invoked the superpowers skills. Now I need to execute the actual tasks. Let me start with Task 1:
Initialize Go Module.
⏺ I've reviewed both plan files. The implementation plan is well-structured with 12 TDD-based tasks. Let me start implementing.
∴ Thinking…
Let me start implementing Task 1 by running the Go module initialization commands.
⏺ Bash(go mod init notion2md)
⎿  go: creating new go.mod: module notion2md
go: to add module requirements and sums:
go mod tidy
⏺ Bash(go mod tidy && git init)
⎿  go: warning: "all" matched no packages
Initialized empty Git repository in /Users/htran/huy/code/notions3/.git/
∴ Thinking…
⏺ Bash(git add go.mod && git commit -m "init: add Go module")
⎿  [main (root-commit) 8a9b56c] init: add Go module
1 file changed, 3 insertions(+)
create mode 100644 go.mod
∴ Thinking…
Let me create a task for Task 2 (URL Parser with TDD).
∴ Thinking…
Let me implement the URL parser following the TDD approach in the plan. I'll create the test file first, then run it to verify it fails, then
implement the code.
⏺ Write(/Users/htran/huy/code/notions3/parser_test.go)
⎿  Wrote 29 lines to /Users/htran/huy/code/notions3/parser_test.go
1 package main
2
3 import "testing"
4
5 func TestExtractPageID(t *testing.T) {
6 tests := []struct {
7 url string
8 expected string
9 wantErr bool
10 }{
11 {"https://www.notion.so/page-id-123", "page-id-123", false},
12 {"https://notion.so/page-id-456", "page-id-456", false},
13 {"https://www.notion.so/myworkspace/page-id-789", "page-id-789", false},
14 {"invalid-url", "", true},
15 }
16
17 for _, tt := range tests {
18 t.Run(tt.url, func(t *testing.T) {
19 got, err := extractPageID(tt.url)
20 if (err != nil) != tt.wantErr {
21 t.Errorf("extractPageID() error = %v, wantErr %v", err, tt.wantErr)
22 return
23 }
24 if got != tt.expected {
25 t.Errorf("extractPageID() = %v, want %v", got, tt.expected)
26 }
27 })
28 }
29 }
∴ Thinking…
⏺ Bash(go test -v -run TestExtractPageID)
⎿  Running…
1 tasks (0 done, 1 open)
◻ Implement URL Parser with TDD
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Bash command
go test -v -run TestExtractPageID
Run parser test to verify it fails
This command requires approval
Do you want to proceed?
❯ 1. Yes
2. Yes, and don’t ask again for: go test:*
3. No
Esc to cancel · Tab to amend · ctrl+e to explain
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 96098097530
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 28897, total state size = 627.539 MiB
srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.010
srv update: - cache state: 9 prompts, 3019.489 MiB (limits: 8192.000 MiB, 100096 tokens, 236061 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02aac10: 266 tokens, checkpoints: 0, 68.012 MiB
srv update: - prompt 0x9d02aae10: 28897 tokens, checkpoints: 8, 1130.044 MiB
srv get_availabl: prompt cache update took 29.64 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 9502 | processing task, is_child = 0
slot update_slots: id 0 | task 9502 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 308
slot update_slots: id 0 | task 9502 | n_past = 3, slot.prompt.tokens.size() = 28897, seq_id = 0, pos_min = 28896, n_swa = 1
slot update_slots: id 0 | task 9502 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 26950, pos_max = 26950, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27261, pos_max = 27261, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27630, pos_max = 27630, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27705, pos_max = 27705, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27791, pos_max = 27791, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27882, pos_max = 27882, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27947, pos_max = 27947, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 28568, pos_max = 28568, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9502 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 9502 | prompt processing progress, n_tokens = 308, batch.n_tokens = 308, progress = 1.000000
slot update_slots: id 0 | task 9502 | prompt done, n_tokens = 308, batch.n_tokens = 308
slot init_sampler: id 0 | task 9502 | init sampler, took 0.03 ms, tokens: text = 308, total = 308
srv params_from_: Chat format: peg-constructed
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9502 |
prompt eval time = 521.94 ms / 308 tokens ( 1.69 ms per token, 590.11 tokens per second)
eval time = 3153.06 ms / 96 tokens ( 32.84 ms per token, 30.45 tokens per second)
total time = 3675.00 ms / 404 tokens
slot release: id 0 | task 9502 | stop processing: n_tokens = 403, truncated = 0
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 96102058623
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 403, total state size = 70.689 MiB
srv load: - looking for better prompt, base f_keep = 0.007, sim = 0.000
srv load: - found better prompt with f_keep = 0.997, sim = 0.997
srv update: - cache state: 9 prompts, 1960.134 MiB (limits: 8192.000 MiB, 100096 tokens, 244556 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02aac10: 266 tokens, checkpoints: 0, 68.012 MiB
srv update: - prompt 0x9d02ab010: 403 tokens, checkpoints: 0, 70.689 MiB
srv get_availabl: prompt cache update took 54.75 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 9504 | processing task, is_child = 0
slot update_slots: id 0 | task 9504 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 28874
slot update_slots: id 0 | task 9504 | n_past = 28800, slot.prompt.tokens.size() = 28897, seq_id = 0, pos_min = 28896, n_swa = 1
slot update_slots: id 0 | task 9504 | restored context checkpoint (pos_min = 28568, pos_max = 28568, size = 62.813 MiB)
slot update_slots: id 0 | task 9504 | n_tokens = 28569, memory_seq_rm [28569, end)
slot update_slots: id 0 | task 9504 | prompt processing progress, n_tokens = 28874, batch.n_tokens = 305, progress = 1.000000
slot update_slots: id 0 | task 9504 | prompt done, n_tokens = 28874, batch.n_tokens = 305
slot init_sampler: id 0 | task 9504 | init sampler, took 2.97 ms, tokens: text = 28874, total = 28874
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9504 |
prompt eval time = 874.93 ms / 305 tokens ( 2.87 ms per token, 348.60 tokens per second)
eval time = 39617.38 ms / 1088 tokens ( 36.41 ms per token, 27.46 tokens per second)
total time = 40492.31 ms / 1393 tokens
slot release: id 0 | task 9504 | stop processing: n_tokens = 29961, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.962 (> 0.100 thold), f_keep = 0.964
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 10688 | processing task, is_child = 0
slot update_slots: id 0 | task 10688 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 30014
slot update_slots: id 0 | task 10688 | n_past = 28873, slot.prompt.tokens.size() = 29961, seq_id = 0, pos_min = 29960, n_swa = 1
slot update_slots: id 0 | task 10688 | restored context checkpoint (pos_min = 28568, pos_max = 28568, size = 62.813 MiB)
slot update_slots: id 0 | task 10688 | n_tokens = 28569, memory_seq_rm [28569, end)
slot update_slots: id 0 | task 10688 | prompt processing progress, n_tokens = 29502, batch.n_tokens = 933, progress = 0.982941
slot update_slots: id 0 | task 10688 | n_tokens = 29502, memory_seq_rm [29502, end)
slot update_slots: id 0 | task 10688 | prompt processing progress, n_tokens = 30014, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 10688 | prompt done, n_tokens = 30014, batch.n_tokens = 512
slot init_sampler: id 0 | task 10688 | init sampler, took 6.42 ms, tokens: text = 30014, total = 30014
slot update_slots: id 0 | task 10688 | erasing old context checkpoint (pos_min = 26950, pos_max = 26950, size = 62.813 MiB)
slot update_slots: id 0 | task 10688 | created context checkpoint 8 of 8 (pos_min = 29501, pos_max = 29501, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10688 |
prompt eval time = 3524.40 ms / 1445 tokens ( 2.44 ms per token, 410.00 tokens per second)
eval time = 1867.03 ms / 50 tokens ( 37.34 ms per token, 26.78 tokens per second)
total time = 5391.43 ms / 1495 tokens
slot release: id 0 | task 10688 | stop processing: n_tokens = 30063, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.928 (> 0.100 thold), f_keep = 0.958
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 10740 | processing task, is_child = 0
slot update_slots: id 0 | task 10740 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 31027
slot update_slots: id 0 | task 10740 | n_past = 28799, slot.prompt.tokens.size() = 30063, seq_id = 0, pos_min = 30062, n_swa = 1
slot update_slots: id 0 | task 10740 | restored context checkpoint (pos_min = 28568, pos_max = 28568, size = 62.813 MiB)
slot update_slots: id 0 | task 10740 | erased invalidated context checkpoint (pos_min = 29501, pos_max = 29501, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 10740 | n_tokens = 28569, memory_seq_rm [28569, end)
slot update_slots: id 0 | task 10740 | prompt processing progress, n_tokens = 30515, batch.n_tokens = 1946, progress = 0.983498
slot update_slots: id 0 | task 10740 | n_tokens = 30515, memory_seq_rm [30515, end)
slot update_slots: id 0 | task 10740 | prompt processing progress, n_tokens = 31027, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 10740 | prompt done, n_tokens = 31027, batch.n_tokens = 512
slot init_sampler: id 0 | task 10740 | init sampler, took 2.86 ms, tokens: text = 31027, total = 31027
slot update_slots: id 0 | task 10740 | created context checkpoint 8 of 8 (pos_min = 30514, pos_max = 30514, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10740 |
prompt eval time = 5942.84 ms / 2458 tokens ( 2.42 ms per token, 413.61 tokens per second)
eval time = 289103.05 ms / 7597 tokens ( 38.05 ms per token, 26.28 tokens per second)
total time = 295045.89 ms / 10055 tokens
slot release: id 0 | task 10740 | stop processing: n_tokens = 38623, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.803 (> 0.100 thold), f_keep = 0.803
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 18339 | processing task, is_child = 0
slot update_slots: id 0 | task 18339 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 38639
slot update_slots: id 0 | task 18339 | n_past = 31026, slot.prompt.tokens.size() = 38623, seq_id = 0, pos_min = 38622, n_swa = 1
slot update_slots: id 0 | task 18339 | restored context checkpoint (pos_min = 30514, pos_max = 30514, size = 62.813 MiB)
slot update_slots: id 0 | task 18339 | n_tokens = 30515, memory_seq_rm [30515, end)
slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 32563, batch.n_tokens = 2048, progress = 0.842750
slot update_slots: id 0 | task 18339 | n_tokens = 32563, memory_seq_rm [32563, end)
slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 34611, batch.n_tokens = 2048, progress = 0.895753
slot update_slots: id 0 | task 18339 | n_tokens = 34611, memory_seq_rm [34611, end)
slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 36659, batch.n_tokens = 2048, progress = 0.948756
slot update_slots: id 0 | task 18339 | n_tokens = 36659, memory_seq_rm [36659, end)
slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 38127, batch.n_tokens = 1468, progress = 0.986749
slot update_slots: id 0 | task 18339 | n_tokens = 38127, memory_seq_rm [38127, end)
slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 38639, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 18339 | prompt done, n_tokens = 38639, batch.n_tokens = 512
slot init_sampler: id 0 | task 18339 | init sampler, took 5.33 ms, tokens: text = 38639, total = 38639
slot update_slots: id 0 | task 18339 | erasing old context checkpoint (pos_min = 27261, pos_max = 27261, size = 62.813 MiB)
slot update_slots: id 0 | task 18339 | created context checkpoint 8 of 8 (pos_min = 38126, pos_max = 38126, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 18339 |
prompt eval time = 21146.69 ms / 8124 tokens ( 2.60 ms per token, 384.17 tokens per second)
eval time = 8754.44 ms / 223 tokens ( 39.26 ms per token, 25.47 tokens per second)
total time = 29901.12 ms / 8347 tokens
slot release: id 0 | task 18339 | stop processing: n_tokens = 38861, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.793 (> 0.100 thold), f_keep = 0.798
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 18567 | processing task, is_child = 0
slot update_slots: id 0 | task 18567 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 39120
slot update_slots: id 0 | task 18567 | n_past = 31025, slot.prompt.tokens.size() = 38861, seq_id = 0, pos_min = 38860, n_swa = 1
slot update_slots: id 0 | task 18567 | restored context checkpoint (pos_min = 30514, pos_max = 30514, size = 62.813 MiB)
slot update_slots: id 0 | task 18567 | erased invalidated context checkpoint (pos_min = 38126, pos_max = 38126, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18567 | n_tokens = 30515, memory_seq_rm [30515, end)
slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 32563, batch.n_tokens = 2048, progress = 0.832388
slot update_slots: id 0 | task 18567 | n_tokens = 32563, memory_seq_rm [32563, end)
slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 34611, batch.n_tokens = 2048, progress = 0.884739
slot update_slots: id 0 | task 18567 | n_tokens = 34611, memory_seq_rm [34611, end)
slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 36659, batch.n_tokens = 2048, progress = 0.937091
slot update_slots: id 0 | task 18567 | n_tokens = 36659, memory_seq_rm [36659, end)
slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 38608, batch.n_tokens = 1949, progress = 0.986912
slot update_slots: id 0 | task 18567 | n_tokens = 38608, memory_seq_rm [38608, end)
slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 39120, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 18567 | prompt done, n_tokens = 39120, batch.n_tokens = 512
slot init_sampler: id 0 | task 18567 | init sampler, took 3.83 ms, tokens: text = 39120, total = 39120
slot update_slots: id 0 | task 18567 | created context checkpoint 8 of 8 (pos_min = 38607, pos_max = 38607, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 18567 |
prompt eval time = 22138.42 ms / 8605 tokens ( 2.57 ms per token, 388.69 tokens per second)
eval time = 1448.88 ms / 36 tokens ( 40.25 ms per token, 24.85 tokens per second)
total time = 23587.30 ms / 8641 tokens
slot release: id 0 | task 18567 | stop processing: n_tokens = 39155, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 96497109660
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 39155, total state size = 828.008 MiB
srv params_from_: Chat format: peg-constructed
srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.030
srv load: - found better prompt with f_keep = 0.308, sim = 0.820
srv update: - cache state: 9 prompts, 3222.636 MiB (limits: 8192.000 MiB, 100096 tokens, 247605 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02ab010: 403 tokens, checkpoints: 0, 70.689 MiB
srv update: - prompt 0x9d02aaf10: 39155 tokens, checkpoints: 8, 1330.513 MiB
srv get_availabl: prompt cache update took 595.83 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 18608 | processing task, is_child = 0
slot update_slots: id 0 | task 18608 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 100
slot update_slots: id 0 | task 18608 | n_past = 82, slot.prompt.tokens.size() = 266, seq_id = 0, pos_min = 265, n_swa = 1
slot update_slots: id 0 | task 18608 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 18608 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 18608 | prompt processing progress, n_tokens = 100, batch.n_tokens = 100, progress = 1.000000
slot update_slots: id 0 | task 18608 | prompt done, n_tokens = 100, batch.n_tokens = 100
slot init_sampler: id 0 | task 18608 | init sampler, took 0.01 ms, tokens: text = 100, total = 100
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 18608 |
prompt eval time = 1208.04 ms / 100 tokens ( 12.08 ms per token, 82.78 tokens per second)
eval time = 4917.53 ms / 166 tokens ( 29.62 ms per token, 33.76 tokens per second)
total time = 6125.57 ms / 266 tokens
slot release: id 0 | task 18608 | stop processing: n_tokens = 265, truncated = 0
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99024008913
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 265, total state size = 67.992 MiB
srv load: - looking for better prompt, base f_keep = 0.011, sim = 0.000
srv load: - found better prompt with f_keep = 0.383, sim = 0.626
srv update: - cache state: 9 prompts, 1960.115 MiB (limits: 8192.000 MiB, 100096 tokens, 244554 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02ab010: 403 tokens, checkpoints: 0, 70.689 MiB
srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB
srv get_availabl: prompt cache update took 116.23 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 18609 | processing task, is_child = 0
slot update_slots: id 0 | task 18609 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 23968
slot update_slots: id 0 | task 18609 | n_past = 14999, slot.prompt.tokens.size() = 39155, seq_id = 0, pos_min = 39154, n_swa = 1
slot update_slots: id 0 | task 18609 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27630, pos_max = 27630, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27705, pos_max = 27705, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27791, pos_max = 27791, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27882, pos_max = 27882, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27947, pos_max = 27947, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 28568, pos_max = 28568, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 30514, pos_max = 30514, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 38607, pos_max = 38607, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 18609 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.085447
slot update_slots: id 0 | task 18609 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.170895
slot update_slots: id 0 | task 18609 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.256342
slot update_slots: id 0 | task 18609 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.341789
slot update_slots: id 0 | task 18609 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 10240, batch.n_tokens = 2048, progress = 0.427236
slot update_slots: id 0 | task 18609 | n_tokens = 10240, memory_seq_rm [10240, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.512684
slot update_slots: id 0 | task 18609 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 14336, batch.n_tokens = 2048, progress = 0.598131
slot update_slots: id 0 | task 18609 | n_tokens = 14336, memory_seq_rm [14336, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 2048, progress = 0.683578
slot update_slots: id 0 | task 18609 | n_tokens = 16384, memory_seq_rm [16384, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 18432, batch.n_tokens = 2048, progress = 0.769025
slot update_slots: id 0 | task 18609 | n_tokens = 18432, memory_seq_rm [18432, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 20480, batch.n_tokens = 2048, progress = 0.854473
slot update_slots: id 0 | task 18609 | n_tokens = 20480, memory_seq_rm [20480, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 22528, batch.n_tokens = 2048, progress = 0.939920
slot update_slots: id 0 | task 18609 | n_tokens = 22528, memory_seq_rm [22528, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 23456, batch.n_tokens = 928, progress = 0.978638
slot update_slots: id 0 | task 18609 | n_tokens = 23456, memory_seq_rm [23456, end)
slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 23968, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 18609 | prompt done, n_tokens = 23968, batch.n_tokens = 512
slot init_sampler: id 0 | task 18609 | init sampler, took 2.62 ms, tokens: text = 23968, total = 23968
slot update_slots: id 0 | task 18609 | created context checkpoint 1 of 8 (pos_min = 23455, pos_max = 23455, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 18609 |
prompt eval time = 44416.72 ms / 23968 tokens ( 1.85 ms per token, 539.62 tokens per second)
eval time = 2685.56 ms / 76 tokens ( 35.34 ms per token, 28.30 tokens per second)
total time = 47102.28 ms / 24044 tokens
slot release: id 0 | task 18609 | stop processing: n_tokens = 24043, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.955 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 18864 | processing task, is_child = 0
slot update_slots: id 0 | task 18864 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 25097
slot update_slots: id 0 | task 18864 | n_past = 23966, slot.prompt.tokens.size() = 24043, seq_id = 0, pos_min = 24042, n_swa = 1
slot update_slots: id 0 | task 18864 | restored context checkpoint (pos_min = 23455, pos_max = 23455, size = 62.813 MiB)
slot update_slots: id 0 | task 18864 | n_tokens = 23456, memory_seq_rm [23456, end)
slot update_slots: id 0 | task 18864 | prompt processing progress, n_tokens = 24585, batch.n_tokens = 1129, progress = 0.979599
slot update_slots: id 0 | task 18864 | n_tokens = 24585, memory_seq_rm [24585, end)
slot update_slots: id 0 | task 18864 | prompt processing progress, n_tokens = 25097, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 18864 | prompt done, n_tokens = 25097, batch.n_tokens = 512
slot init_sampler: id 0 | task 18864 | init sampler, took 5.19 ms, tokens: text = 25097, total = 25097
slot update_slots: id 0 | task 18864 | created context checkpoint 2 of 8 (pos_min = 24584, pos_max = 24584, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 18864 |
prompt eval time = 3818.16 ms / 1641 tokens ( 2.33 ms per token, 429.79 tokens per second)
eval time = 2619.72 ms / 78 tokens ( 33.59 ms per token, 29.77 tokens per second)
total time = 6437.88 ms / 1719 tokens
slot release: id 0 | task 18864 | stop processing: n_tokens = 25174, truncated = 0
srv update_slots: all slots are idle
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.618 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 18944 | processing task, is_child = 0
slot update_slots: id 0 | task 18944 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 40585
slot update_slots: id 0 | task 18944 | n_past = 25096, slot.prompt.tokens.size() = 25174, seq_id = 0, pos_min = 25173, n_swa = 1
slot update_slots: id 0 | task 18944 | restored context checkpoint (pos_min = 24584, pos_max = 24584, size = 62.813 MiB)
slot update_slots: id 0 | task 18944 | n_tokens = 24585, memory_seq_rm [24585, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 26633, batch.n_tokens = 2048, progress = 0.656228
slot update_slots: id 0 | task 18944 | n_tokens = 26633, memory_seq_rm [26633, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 28681, batch.n_tokens = 2048, progress = 0.706690
slot update_slots: id 0 | task 18944 | n_tokens = 28681, memory_seq_rm [28681, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 30729, batch.n_tokens = 2048, progress = 0.757152
slot update_slots: id 0 | task 18944 | n_tokens = 30729, memory_seq_rm [30729, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 32777, batch.n_tokens = 2048, progress = 0.807614
slot update_slots: id 0 | task 18944 | n_tokens = 32777, memory_seq_rm [32777, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 34825, batch.n_tokens = 2048, progress = 0.858076
slot update_slots: id 0 | task 18944 | n_tokens = 34825, memory_seq_rm [34825, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 36873, batch.n_tokens = 2048, progress = 0.908538
slot update_slots: id 0 | task 18944 | n_tokens = 36873, memory_seq_rm [36873, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 38921, batch.n_tokens = 2048, progress = 0.959000
slot update_slots: id 0 | task 18944 | n_tokens = 38921, memory_seq_rm [38921, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 40073, batch.n_tokens = 1152, progress = 0.987384
slot update_slots: id 0 | task 18944 | n_tokens = 40073, memory_seq_rm [40073, end)
slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 40585, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 18944 | prompt done, n_tokens = 40585, batch.n_tokens = 512
slot init_sampler: id 0 | task 18944 | init sampler, took 6.66 ms, tokens: text = 40585, total = 40585
slot update_slots: id 0 | task 18944 | created context checkpoint 3 of 8 (pos_min = 40072, pos_max = 40072, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 18944 |
prompt eval time = 40686.51 ms / 16000 tokens ( 2.54 ms per token, 393.25 tokens per second)
eval time = 3115.05 ms / 81 tokens ( 38.46 ms per token, 26.00 tokens per second)
total time = 43801.56 ms / 16081 tokens
slot release: id 0 | task 18944 | stop processing: n_tokens = 40665, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.950 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 19034 | processing task, is_child = 0
slot update_slots: id 0 | task 19034 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 42722
slot update_slots: id 0 | task 19034 | n_past = 40584, slot.prompt.tokens.size() = 40665, seq_id = 0, pos_min = 40664, n_swa = 1
slot update_slots: id 0 | task 19034 | restored context checkpoint (pos_min = 40072, pos_max = 40072, size = 62.813 MiB)
slot update_slots: id 0 | task 19034 | n_tokens = 40073, memory_seq_rm [40073, end)
slot update_slots: id 0 | task 19034 | prompt processing progress, n_tokens = 42121, batch.n_tokens = 2048, progress = 0.985932
slot update_slots: id 0 | task 19034 | n_tokens = 42121, memory_seq_rm [42121, end)
slot update_slots: id 0 | task 19034 | prompt processing progress, n_tokens = 42210, batch.n_tokens = 89, progress = 0.988016
slot update_slots: id 0 | task 19034 | n_tokens = 42210, memory_seq_rm [42210, end)
slot update_slots: id 0 | task 19034 | prompt processing progress, n_tokens = 42722, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 19034 | prompt done, n_tokens = 42722, batch.n_tokens = 512
slot init_sampler: id 0 | task 19034 | init sampler, took 6.81 ms, tokens: text = 42722, total = 42722
slot update_slots: id 0 | task 19034 | created context checkpoint 4 of 8 (pos_min = 42209, pos_max = 42209, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 19034 |
prompt eval time = 7603.79 ms / 2649 tokens ( 2.87 ms per token, 348.38 tokens per second)
eval time = 4877.39 ms / 124 tokens ( 39.33 ms per token, 25.42 tokens per second)
total time = 12481.17 ms / 2773 tokens
slot release: id 0 | task 19034 | stop processing: n_tokens = 42845, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.577 (> 0.100 thold), f_keep = 0.586
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 19161 | processing task, is_child = 0
slot update_slots: id 0 | task 19161 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 43489
slot update_slots: id 0 | task 19161 | n_past = 25095, slot.prompt.tokens.size() = 42845, seq_id = 0, pos_min = 42844, n_swa = 1
slot update_slots: id 0 | task 19161 | restored context checkpoint (pos_min = 24584, pos_max = 24584, size = 62.813 MiB)
slot update_slots: id 0 | task 19161 | erased invalidated context checkpoint (pos_min = 40072, pos_max = 40072, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 19161 | erased invalidated context checkpoint (pos_min = 42209, pos_max = 42209, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 19161 | n_tokens = 24585, memory_seq_rm [24585, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 26633, batch.n_tokens = 2048, progress = 0.612408
slot update_slots: id 0 | task 19161 | n_tokens = 26633, memory_seq_rm [26633, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 28681, batch.n_tokens = 2048, progress = 0.659500
slot update_slots: id 0 | task 19161 | n_tokens = 28681, memory_seq_rm [28681, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 30729, batch.n_tokens = 2048, progress = 0.706593
slot update_slots: id 0 | task 19161 | n_tokens = 30729, memory_seq_rm [30729, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 32777, batch.n_tokens = 2048, progress = 0.753685
slot update_slots: id 0 | task 19161 | n_tokens = 32777, memory_seq_rm [32777, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 34825, batch.n_tokens = 2048, progress = 0.800777
slot update_slots: id 0 | task 19161 | n_tokens = 34825, memory_seq_rm [34825, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 36873, batch.n_tokens = 2048, progress = 0.847870
slot update_slots: id 0 | task 19161 | n_tokens = 36873, memory_seq_rm [36873, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 38921, batch.n_tokens = 2048, progress = 0.894962
slot update_slots: id 0 | task 19161 | n_tokens = 38921, memory_seq_rm [38921, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 40969, batch.n_tokens = 2048, progress = 0.942054
slot update_slots: id 0 | task 19161 | n_tokens = 40969, memory_seq_rm [40969, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 42977, batch.n_tokens = 2008, progress = 0.988227
slot update_slots: id 0 | task 19161 | n_tokens = 42977, memory_seq_rm [42977, end)
slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 43489, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 19161 | prompt done, n_tokens = 43489, batch.n_tokens = 512
slot init_sampler: id 0 | task 19161 | init sampler, took 6.78 ms, tokens: text = 43489, total = 43489
slot update_slots: id 0 | task 19161 | created context checkpoint 3 of 8 (pos_min = 42976, pos_max = 42976, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 19161 |
prompt eval time = 48729.18 ms / 18904 tokens ( 2.58 ms per token, 387.94 tokens per second)
eval time = 7576.52 ms / 191 tokens ( 39.67 ms per token, 25.21 tokens per second)
total time = 56305.70 ms / 19095 tokens
slot release: id 0 | task 19161 | stop processing: n_tokens = 43679, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.965 (> 0.100 thold), f_keep = 0.996
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 19362 | processing task, is_child = 0
slot update_slots: id 0 | task 19362 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45050
slot update_slots: id 0 | task 19362 | n_past = 43487, slot.prompt.tokens.size() = 43679, seq_id = 0, pos_min = 43678, n_swa = 1
slot update_slots: id 0 | task 19362 | restored context checkpoint (pos_min = 42976, pos_max = 42976, size = 62.813 MiB)
slot update_slots: id 0 | task 19362 | n_tokens = 42977, memory_seq_rm [42977, end)
slot update_slots: id 0 | task 19362 | prompt processing progress, n_tokens = 44538, batch.n_tokens = 1561, progress = 0.988635
slot update_slots: id 0 | task 19362 | n_tokens = 44538, memory_seq_rm [44538, end)
slot update_slots: id 0 | task 19362 | prompt processing progress, n_tokens = 45050, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 19362 | prompt done, n_tokens = 45050, batch.n_tokens = 512
slot init_sampler: id 0 | task 19362 | init sampler, took 8.52 ms, tokens: text = 45050, total = 45050
slot update_slots: id 0 | task 19362 | created context checkpoint 4 of 8 (pos_min = 44537, pos_max = 44537, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 19362 |
prompt eval time = 6200.91 ms / 2073 tokens ( 2.99 ms per token, 334.31 tokens per second)
eval time = 5084.72 ms / 138 tokens ( 36.85 ms per token, 27.14 tokens per second)
total time = 11285.63 ms / 2211 tokens
slot release: id 0 | task 19362 | stop processing: n_tokens = 45187, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 19502 | processing task, is_child = 0
slot update_slots: id 0 | task 19502 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45172
slot update_slots: id 0 | task 19502 | n_past = 45049, slot.prompt.tokens.size() = 45187, seq_id = 0, pos_min = 45186, n_swa = 1
slot update_slots: id 0 | task 19502 | restored context checkpoint (pos_min = 44537, pos_max = 44537, size = 62.813 MiB)
slot update_slots: id 0 | task 19502 | n_tokens = 44538, memory_seq_rm [44538, end)
slot update_slots: id 0 | task 19502 | prompt processing progress, n_tokens = 44660, batch.n_tokens = 122, progress = 0.988666
slot update_slots: id 0 | task 19502 | n_tokens = 44660, memory_seq_rm [44660, end)
slot update_slots: id 0 | task 19502 | prompt processing progress, n_tokens = 45172, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 19502 | prompt done, n_tokens = 45172, batch.n_tokens = 512
slot init_sampler: id 0 | task 19502 | init sampler, took 4.45 ms, tokens: text = 45172, total = 45172
slot update_slots: id 0 | task 19502 | created context checkpoint 5 of 8 (pos_min = 44659, pos_max = 44659, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 19502 |
prompt eval time = 2006.29 ms / 634 tokens ( 3.16 ms per token, 316.01 tokens per second)
eval time = 2413.69 ms / 67 tokens ( 36.03 ms per token, 27.76 tokens per second)
total time = 4419.98 ms / 701 tokens
slot release: id 0 | task 19502 | stop processing: n_tokens = 45238, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99206848358
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 45238, total state size = 946.886 MiB
srv params_from_: Chat format: peg-constructed
srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.009
srv load: - found better prompt with f_keep = 0.700, sim = 0.873
srv update: - cache state: 9 prompts, 3150.377 MiB (limits: 8192.000 MiB, 100096 tokens, 268743 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB
srv update: - prompt 0x9d0dc0010: 45238 tokens, checkpoints: 5, 1260.952 MiB
srv get_availabl: prompt cache update took 328.59 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 19571 | processing task, is_child = 0
slot update_slots: id 0 | task 19571 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 323
slot update_slots: id 0 | task 19571 | n_past = 282, slot.prompt.tokens.size() = 403, seq_id = 0, pos_min = 402, n_swa = 1
slot update_slots: id 0 | task 19571 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 19571 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 19571 | prompt processing progress, n_tokens = 323, batch.n_tokens = 323, progress = 1.000000
slot update_slots: id 0 | task 19571 | prompt done, n_tokens = 323, batch.n_tokens = 323
slot init_sampler: id 0 | task 19571 | init sampler, took 0.03 ms, tokens: text = 323, total = 323
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 19571 |
prompt eval time = 576.87 ms / 323 tokens ( 1.79 ms per token, 559.92 tokens per second)
eval time = 6285.42 ms / 210 tokens ( 29.93 ms per token, 33.41 tokens per second)
total time = 6862.29 ms / 533 tokens
slot release: id 0 | task 19571 | stop processing: n_tokens = 532, truncated = 0
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99216836871
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 532, total state size = 73.210 MiB
srv load: - looking for better prompt, base f_keep = 0.006, sim = 0.000
srv load: - found better prompt with f_keep = 0.999, sim = 0.998
srv update: - cache state: 9 prompts, 1962.636 MiB (limits: 8192.000 MiB, 100096 tokens, 244778 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB
srv update: - prompt 0x9d0dc0110: 532 tokens, checkpoints: 0, 73.210 MiB
srv get_availabl: prompt cache update took 57.70 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 19572 | processing task, is_child = 0
slot update_slots: id 0 | task 19572 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45263
slot update_slots: id 0 | task 19572 | n_past = 45171, slot.prompt.tokens.size() = 45238, seq_id = 0, pos_min = 45237, n_swa = 1
slot update_slots: id 0 | task 19572 | restored context checkpoint (pos_min = 44659, pos_max = 44659, size = 62.813 MiB)
slot update_slots: id 0 | task 19572 | n_tokens = 44660, memory_seq_rm [44660, end)
slot update_slots: id 0 | task 19572 | prompt processing progress, n_tokens = 44751, batch.n_tokens = 91, progress = 0.988688
slot update_slots: id 0 | task 19572 | n_tokens = 44751, memory_seq_rm [44751, end)
slot update_slots: id 0 | task 19572 | prompt processing progress, n_tokens = 45263, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 19572 | prompt done, n_tokens = 45263, batch.n_tokens = 512
slot init_sampler: id 0 | task 19572 | init sampler, took 4.57 ms, tokens: text = 45263, total = 45263
slot update_slots: id 0 | task 19572 | created context checkpoint 6 of 8 (pos_min = 44750, pos_max = 44750, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 19572 |
prompt eval time = 1976.89 ms / 603 tokens ( 3.28 ms per token, 305.02 tokens per second)
eval time = 1872.59 ms / 52 tokens ( 36.01 ms per token, 27.77 tokens per second)
total time = 3849.49 ms / 655 tokens
slot release: id 0 | task 19572 | stop processing: n_tokens = 45314, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99220747999
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 45314, total state size = 948.371 MiB
srv params_from_: Chat format: peg-constructed
srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.009
srv load: - found better prompt with f_keep = 0.534, sim = 0.869
srv update: - cache state: 9 prompts, 3214.676 MiB (limits: 8192.000 MiB, 100096 tokens, 263561 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB
srv update: - prompt 0x9d0dc0510: 45314 tokens, checkpoints: 6, 1325.250 MiB
srv get_availabl: prompt cache update took 400.13 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 19836 | processing task, is_child = 0
slot update_slots: id 0 | task 19836 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 327
slot update_slots: id 0 | task 19836 | n_past = 284, slot.prompt.tokens.size() = 532, seq_id = 0, pos_min = 531, n_swa = 1
slot update_slots: id 0 | task 19836 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 19836 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 19836 | prompt processing progress, n_tokens = 327, batch.n_tokens = 327, progress = 1.000000
slot update_slots: id 0 | task 19836 | prompt done, n_tokens = 327, batch.n_tokens = 327
slot init_sampler: id 0 | task 19836 | init sampler, took 0.04 ms, tokens: text = 327, total = 327
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 19836 |
prompt eval time = 573.07 ms / 327 tokens ( 1.75 ms per token, 570.61 tokens per second)
eval time = 4256.85 ms / 144 tokens ( 29.56 ms per token, 33.83 tokens per second)
total time = 4829.91 ms / 471 tokens
slot release: id 0 | task 19836 | stop processing: n_tokens = 470, truncated = 0
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99228000822
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 470, total state size = 71.999 MiB
srv load: - looking for better prompt, base f_keep = 0.006, sim = 0.000
srv load: - found better prompt with f_keep = 0.999, sim = 0.998
srv update: - cache state: 9 prompts, 1961.424 MiB (limits: 8192.000 MiB, 100096 tokens, 244671 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB
srv update: - prompt 0x9d0dc0110: 470 tokens, checkpoints: 0, 71.999 MiB
srv get_availabl: prompt cache update took 35.73 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 19837 | processing task, is_child = 0
slot update_slots: id 0 | task 19837 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45359
slot update_slots: id 0 | task 19837 | n_past = 45262, slot.prompt.tokens.size() = 45314, seq_id = 0, pos_min = 45313, n_swa = 1
slot update_slots: id 0 | task 19837 | restored context checkpoint (pos_min = 44750, pos_max = 44750, size = 62.813 MiB)
slot update_slots: id 0 | task 19837 | n_tokens = 44751, memory_seq_rm [44751, end)
slot update_slots: id 0 | task 19837 | prompt processing progress, n_tokens = 44847, batch.n_tokens = 96, progress = 0.988712
slot update_slots: id 0 | task 19837 | n_tokens = 44847, memory_seq_rm [44847, end)
slot update_slots: id 0 | task 19837 | prompt processing progress, n_tokens = 45359, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 19837 | prompt done, n_tokens = 45359, batch.n_tokens = 512
slot init_sampler: id 0 | task 19837 | init sampler, took 4.70 ms, tokens: text = 45359, total = 45359
slot update_slots: id 0 | task 19837 | created context checkpoint 7 of 8 (pos_min = 44846, pos_max = 44846, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 19837 |
prompt eval time = 1952.77 ms / 608 tokens ( 3.21 ms per token, 311.35 tokens per second)
eval time = 2258.34 ms / 61 tokens ( 37.02 ms per token, 27.01 tokens per second)
total time = 4211.11 ms / 669 tokens
slot release: id 0 | task 19837 | stop processing: n_tokens = 45419, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99232251113
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 45419, total state size = 950.423 MiB
srv params_from_: Chat format: peg-constructed
srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.009
srv load: - found better prompt with f_keep = 0.600, sim = 0.801
srv update: - cache state: 9 prompts, 3279.541 MiB (limits: 8192.000 MiB, 100096 tokens, 258611 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB
srv update: - prompt 0x9d0dc0190: 45419 tokens, checkpoints: 7, 1390.116 MiB
srv get_availabl: prompt cache update took 467.96 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 20044 | processing task, is_child = 0
slot update_slots: id 0 | task 20044 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 352
slot update_slots: id 0 | task 20044 | n_past = 282, slot.prompt.tokens.size() = 470, seq_id = 0, pos_min = 469, n_swa = 1
slot update_slots: id 0 | task 20044 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 20044 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 20044 | prompt processing progress, n_tokens = 352, batch.n_tokens = 352, progress = 1.000000
slot update_slots: id 0 | task 20044 | prompt done, n_tokens = 352, batch.n_tokens = 352
slot init_sampler: id 0 | task 20044 | init sampler, took 0.04 ms, tokens: text = 352, total = 352
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 20044 |
prompt eval time = 590.20 ms / 352 tokens ( 1.68 ms per token, 596.41 tokens per second)
eval time = 14112.52 ms / 476 tokens ( 29.65 ms per token, 33.73 tokens per second)
total time = 14702.72 ms / 828 tokens
slot release: id 0 | task 20044 | stop processing: n_tokens = 827, truncated = 0
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99249687834
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 827, total state size = 78.975 MiB
srv load: - looking for better prompt, base f_keep = 0.004, sim = 0.000
srv load: - found better prompt with f_keep = 0.999, sim = 0.997
srv update: - cache state: 9 prompts, 1968.401 MiB (limits: 8192.000 MiB, 100096 tokens, 245289 est)
srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB
srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB
srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB
srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB
srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB
srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB
srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB
srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB
srv update: - prompt 0x9d0dc0110: 827 tokens, checkpoints: 0, 78.975 MiB
srv get_availabl: prompt cache update took 42.09 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 20045 | processing task, is_child = 0
slot update_slots: id 0 | task 20045 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45477
slot update_slots: id 0 | task 20045 | n_past = 45358, slot.prompt.tokens.size() = 45419, seq_id = 0, pos_min = 45418, n_swa = 1
slot update_slots: id 0 | task 20045 | restored context checkpoint (pos_min = 44846, pos_max = 44846, size = 62.813 MiB)
slot update_slots: id 0 | task 20045 | n_tokens = 44847, memory_seq_rm [44847, end)
slot update_slots: id 0 | task 20045 | prompt processing progress, n_tokens = 44965, batch.n_tokens = 118, progress = 0.988742
slot update_slots: id 0 | task 20045 | n_tokens = 44965, memory_seq_rm [44965, end)
slot update_slots: id 0 | task 20045 | prompt processing progress, n_tokens = 45477, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 20045 | prompt done, n_tokens = 45477, batch.n_tokens = 512
slot init_sampler: id 0 | task 20045 | init sampler, took 4.94 ms, tokens: text = 45477, total = 45477
slot update_slots: id 0 | task 20045 | created context checkpoint 8 of 8 (pos_min = 44964, pos_max = 44964, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 20045 |
prompt eval time = 2043.99 ms / 630 tokens ( 3.24 ms per token, 308.22 tokens per second)
eval time = 1456.21 ms / 40 tokens ( 36.41 ms per token, 27.47 tokens per second)
total time = 3500.20 ms / 670 tokens
slot release: id 0 | task 20045 | stop processing: n_tokens = 45516, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 20563 | processing task, is_child = 0
slot update_slots: id 0 | task 20563 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45537
slot update_slots: id 0 | task 20563 | n_past = 45476, slot.prompt.tokens.size() = 45516, seq_id = 0, pos_min = 45515, n_swa = 1
slot update_slots: id 0 | task 20563 | restored context checkpoint (pos_min = 44964, pos_max = 44964, size = 62.813 MiB)
slot update_slots: id 0 | task 20563 | n_tokens = 44965, memory_seq_rm [44965, end)
slot update_slots: id 0 | task 20563 | prompt processing progress, n_tokens = 45025, batch.n_tokens = 60, progress = 0.988756
slot update_slots: id 0 | task 20563 | n_tokens = 45025, memory_seq_rm [45025, end)
slot update_slots: id 0 | task 20563 | prompt processing progress, n_tokens = 45537, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 20563 | prompt done, n_tokens = 45537, batch.n_tokens = 512
slot init_sampler: id 0 | task 20563 | init sampler, took 4.22 ms, tokens: text = 45537, total = 45537
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 20563 |
prompt eval time = 1867.14 ms / 572 tokens ( 3.26 ms per token, 306.35 tokens per second)
eval time = 3641.80 ms / 100 tokens ( 36.42 ms per token, 27.46 tokens per second)
total time = 5508.94 ms / 672 tokens
slot release: id 0 | task 20563 | stop processing: n_tokens = 45636, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 20665 | processing task, is_child = 0
slot update_slots: id 0 | task 20665 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45647
slot update_slots: id 0 | task 20665 | n_past = 45536, slot.prompt.tokens.size() = 45636, seq_id = 0, pos_min = 45635, n_swa = 1
slot update_slots: id 0 | task 20665 | restored context checkpoint (pos_min = 44964, pos_max = 44964, size = 62.813 MiB)
slot update_slots: id 0 | task 20665 | n_tokens = 44965, memory_seq_rm [44965, end)
slot update_slots: id 0 | task 20665 | prompt processing progress, n_tokens = 45135, batch.n_tokens = 170, progress = 0.988783
slot update_slots: id 0 | task 20665 | n_tokens = 45135, memory_seq_rm [45135, end)
slot update_slots: id 0 | task 20665 | prompt processing progress, n_tokens = 45647, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 20665 | prompt done, n_tokens = 45647, batch.n_tokens = 512
slot init_sampler: id 0 | task 20665 | init sampler, took 4.27 ms, tokens: text = 45647, total = 45647
slot update_slots: id 0 | task 20665 | erasing old context checkpoint (pos_min = 23455, pos_max = 23455, size = 62.813 MiB)
slot update_slots: id 0 | task 20665 | created context checkpoint 8 of 8 (pos_min = 45134, pos_max = 45134, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 20665 |
prompt eval time = 2191.72 ms / 682 tokens ( 3.21 ms per token, 311.17 tokens per second)
eval time = 12725.11 ms / 348 tokens ( 36.57 ms per token, 27.35 tokens per second)
total time = 14916.83 ms / 1030 tokens
slot release: id 0 | task 20665 | stop processing: n_tokens = 45994, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 0.992
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 21015 | processing task, is_child = 0
slot update_slots: id 0 | task 21015 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45991
slot update_slots: id 0 | task 21015 | n_past = 45646, slot.prompt.tokens.size() = 45994, seq_id = 0, pos_min = 45993, n_swa = 1
slot update_slots: id 0 | task 21015 | restored context checkpoint (pos_min = 45134, pos_max = 45134, size = 62.813 MiB)
slot update_slots: id 0 | task 21015 | n_tokens = 45135, memory_seq_rm [45135, end)
slot update_slots: id 0 | task 21015 | prompt processing progress, n_tokens = 45479, batch.n_tokens = 344, progress = 0.988867
slot update_slots: id 0 | task 21015 | n_tokens = 45479, memory_seq_rm [45479, end)
slot update_slots: id 0 | task 21015 | prompt processing progress, n_tokens = 45991, batch.n_tokens = 512, progress = 1.000000
slot update_slots: id 0 | task 21015 | prompt done, n_tokens = 45991, batch.n_tokens = 512
slot init_sampler: id 0 | task 21015 | init sampler, took 4.34 ms, tokens: text = 45991, total = 45991
slot update_slots: id 0 | task 21015 | erasing old context checkpoint (pos_min = 24584, pos_max = 24584, size = 62.813 MiB)
slot update_slots: id 0 | task 21015 | created context checkpoint 8 of 8 (pos_min = 45478, pos_max = 45478, size = 62.813 MiB)
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 21015 |
prompt eval time = 2654.72 ms / 856 tokens ( 3.10 ms per token, 322.44 tokens per second)
eval time = 2077.70 ms / 57 tokens ( 36.45 ms per token, 27.43 tokens per second)
total time = 4732.42 ms / 913 tokens
slot release: id 0 | task 21015 | stop processing: n_tokens = 46047, truncated = 0
srv update_slots: all slots are idle
@huytd
Copy link
Author

huytd commented Feb 24, 2026

Claude Code with Qwen3.5-35B-A3B, running on llama.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment