Last active
February 24, 2026 20:41
-
-
Save huytd/43c9826d269b59887eab3e05a7bcb99c to your computer and use it in GitHub Desktop.
Qwen3.5-35B-A3B Claude Code Test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ❯ let's implement the plan in @docs/plans/ | |
| ⎿ Listed directory docs/plans/ | |
| ∴ Thinking… | |
| User wants to implement a plan from docs/plans, so I should invoke the brainstorming skill first per the superpowers protocol to explore | |
| requirements before implementation. | |
| ⏺ Skill(superpowers:brainstorming) | |
| ⎿ Successfully loaded skill | |
| ∴ Thinking… | |
| Let me read the plan files that were listed earlier to understand what needs to be implemented. | |
| ⏺ Read(docs/plans/2026-02-24-notion-to-markdown-implementation.md) | |
| ⎿ Read 1242 lines | |
| ⏺ Read(docs/plans/2026-02-24-notion-to-markdown-app-design.md) | |
| ⎿ Read 173 lines | |
| ∴ Thinking… | |
| I need to read the second plan file (design doc) to understand the full context before proceeding with implementation. | |
| ⏺ Now I have both plan files. The user wants to implement the Notion to Markdown converter plan. I should invoke the superpowers:executing-plans | |
| skill as specified in the implementation plan document. | |
| ⏺ Skill(superpowers:executing-plans) | |
| ⎿ Successfully loaded skill | |
| ∴ Thinking… | |
| I've invoked the executing-plans skill which will guide me through the implementation process. The skill has loaded and now I should follow its | |
| workflow: review the plan critically, create tasks, execute the first batch of 3 tasks, and report for review. Let me proceed with the plan review | |
| and task creation. | |
| ⏺ I'm using the executing-plans skill to implement this plan. | |
| I've reviewed the implementation plan for the Notion to Markdown Converter. The plan is well-structured with 12 tasks following TDD approach using | |
| only Go standard library. | |
| Before starting, I need to set up an isolated workspace. Let me create a git worktree for this implementation: | |
| ⏺ Skill(superpowers:using-git-worktrees) | |
| ⎿ Successfully loaded skill | |
| ∴ Thinking… | |
| The user asked me to implement the plan, and I've invoked the superpowers skills. Now I need to execute the actual tasks. Let me start with Task 1: | |
| Initialize Go Module. | |
| ⏺ I've reviewed both plan files. The implementation plan is well-structured with 12 TDD-based tasks. Let me start implementing. | |
| ∴ Thinking… | |
| Let me start implementing Task 1 by running the Go module initialization commands. | |
| ⏺ Bash(go mod init notion2md) | |
| ⎿ go: creating new go.mod: module notion2md | |
| go: to add module requirements and sums: | |
| go mod tidy | |
| ⏺ Bash(go mod tidy && git init) | |
| ⎿ go: warning: "all" matched no packages | |
| Initialized empty Git repository in /Users/htran/huy/code/notions3/.git/ | |
| ∴ Thinking… | |
| ⏺ Bash(git add go.mod && git commit -m "init: add Go module") | |
| ⎿ [main (root-commit) 8a9b56c] init: add Go module | |
| 1 file changed, 3 insertions(+) | |
| create mode 100644 go.mod | |
| ∴ Thinking… | |
| Let me create a task for Task 2 (URL Parser with TDD). | |
| ∴ Thinking… | |
| Let me implement the URL parser following the TDD approach in the plan. I'll create the test file first, then run it to verify it fails, then | |
| implement the code. | |
| ⏺ Write(/Users/htran/huy/code/notions3/parser_test.go) | |
| ⎿ Wrote 29 lines to /Users/htran/huy/code/notions3/parser_test.go | |
| 1 package main | |
| 2 | |
| 3 import "testing" | |
| 4 | |
| 5 func TestExtractPageID(t *testing.T) { | |
| 6 tests := []struct { | |
| 7 url string | |
| 8 expected string | |
| 9 wantErr bool | |
| 10 }{ | |
| 11 {"https://www.notion.so/page-id-123", "page-id-123", false}, | |
| 12 {"https://notion.so/page-id-456", "page-id-456", false}, | |
| 13 {"https://www.notion.so/myworkspace/page-id-789", "page-id-789", false}, | |
| 14 {"invalid-url", "", true}, | |
| 15 } | |
| 16 | |
| 17 for _, tt := range tests { | |
| 18 t.Run(tt.url, func(t *testing.T) { | |
| 19 got, err := extractPageID(tt.url) | |
| 20 if (err != nil) != tt.wantErr { | |
| 21 t.Errorf("extractPageID() error = %v, wantErr %v", err, tt.wantErr) | |
| 22 return | |
| 23 } | |
| 24 if got != tt.expected { | |
| 25 t.Errorf("extractPageID() = %v, want %v", got, tt.expected) | |
| 26 } | |
| 27 }) | |
| 28 } | |
| 29 } | |
| ∴ Thinking… | |
| ⏺ Bash(go test -v -run TestExtractPageID) | |
| ⎿ Running… | |
| 1 tasks (0 done, 1 open) | |
| ◻ Implement URL Parser with TDD | |
| ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| Bash command | |
| go test -v -run TestExtractPageID | |
| Run parser test to verify it fails | |
| This command requires approval | |
| Do you want to proceed? | |
| ❯ 1. Yes | |
| 2. Yes, and don’t ask again for: go test:* | |
| 3. No | |
| Esc to cancel · Tab to amend · ctrl+e to explain |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 96098097530 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 28897, total state size = 627.539 MiB | |
| srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.010 | |
| srv update: - cache state: 9 prompts, 3019.489 MiB (limits: 8192.000 MiB, 100096 tokens, 236061 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02aac10: 266 tokens, checkpoints: 0, 68.012 MiB | |
| srv update: - prompt 0x9d02aae10: 28897 tokens, checkpoints: 8, 1130.044 MiB | |
| srv get_availabl: prompt cache update took 29.64 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 9502 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 9502 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 308 | |
| slot update_slots: id 0 | task 9502 | n_past = 3, slot.prompt.tokens.size() = 28897, seq_id = 0, pos_min = 28896, n_swa = 1 | |
| slot update_slots: id 0 | task 9502 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) | |
| slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 26950, pos_max = 26950, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27261, pos_max = 27261, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27630, pos_max = 27630, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27705, pos_max = 27705, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27791, pos_max = 27791, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27882, pos_max = 27882, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 27947, pos_max = 27947, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9502 | erased invalidated context checkpoint (pos_min = 28568, pos_max = 28568, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9502 | n_tokens = 0, memory_seq_rm [0, end) | |
| slot update_slots: id 0 | task 9502 | prompt processing progress, n_tokens = 308, batch.n_tokens = 308, progress = 1.000000 | |
| slot update_slots: id 0 | task 9502 | prompt done, n_tokens = 308, batch.n_tokens = 308 | |
| slot init_sampler: id 0 | task 9502 | init sampler, took 0.03 ms, tokens: text = 308, total = 308 | |
| srv params_from_: Chat format: peg-constructed | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 9502 | | |
| prompt eval time = 521.94 ms / 308 tokens ( 1.69 ms per token, 590.11 tokens per second) | |
| eval time = 3153.06 ms / 96 tokens ( 32.84 ms per token, 30.45 tokens per second) | |
| total time = 3675.00 ms / 404 tokens | |
| slot release: id 0 | task 9502 | stop processing: n_tokens = 403, truncated = 0 | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 96102058623 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 403, total state size = 70.689 MiB | |
| srv load: - looking for better prompt, base f_keep = 0.007, sim = 0.000 | |
| srv load: - found better prompt with f_keep = 0.997, sim = 0.997 | |
| srv update: - cache state: 9 prompts, 1960.134 MiB (limits: 8192.000 MiB, 100096 tokens, 244556 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02aac10: 266 tokens, checkpoints: 0, 68.012 MiB | |
| srv update: - prompt 0x9d02ab010: 403 tokens, checkpoints: 0, 70.689 MiB | |
| srv get_availabl: prompt cache update took 54.75 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 9504 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 9504 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 28874 | |
| slot update_slots: id 0 | task 9504 | n_past = 28800, slot.prompt.tokens.size() = 28897, seq_id = 0, pos_min = 28896, n_swa = 1 | |
| slot update_slots: id 0 | task 9504 | restored context checkpoint (pos_min = 28568, pos_max = 28568, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 9504 | n_tokens = 28569, memory_seq_rm [28569, end) | |
| slot update_slots: id 0 | task 9504 | prompt processing progress, n_tokens = 28874, batch.n_tokens = 305, progress = 1.000000 | |
| slot update_slots: id 0 | task 9504 | prompt done, n_tokens = 28874, batch.n_tokens = 305 | |
| slot init_sampler: id 0 | task 9504 | init sampler, took 2.97 ms, tokens: text = 28874, total = 28874 | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 9504 | | |
| prompt eval time = 874.93 ms / 305 tokens ( 2.87 ms per token, 348.60 tokens per second) | |
| eval time = 39617.38 ms / 1088 tokens ( 36.41 ms per token, 27.46 tokens per second) | |
| total time = 40492.31 ms / 1393 tokens | |
| slot release: id 0 | task 9504 | stop processing: n_tokens = 29961, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.962 (> 0.100 thold), f_keep = 0.964 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 10688 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 10688 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 30014 | |
| slot update_slots: id 0 | task 10688 | n_past = 28873, slot.prompt.tokens.size() = 29961, seq_id = 0, pos_min = 29960, n_swa = 1 | |
| slot update_slots: id 0 | task 10688 | restored context checkpoint (pos_min = 28568, pos_max = 28568, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 10688 | n_tokens = 28569, memory_seq_rm [28569, end) | |
| slot update_slots: id 0 | task 10688 | prompt processing progress, n_tokens = 29502, batch.n_tokens = 933, progress = 0.982941 | |
| slot update_slots: id 0 | task 10688 | n_tokens = 29502, memory_seq_rm [29502, end) | |
| slot update_slots: id 0 | task 10688 | prompt processing progress, n_tokens = 30014, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 10688 | prompt done, n_tokens = 30014, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 10688 | init sampler, took 6.42 ms, tokens: text = 30014, total = 30014 | |
| slot update_slots: id 0 | task 10688 | erasing old context checkpoint (pos_min = 26950, pos_max = 26950, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 10688 | created context checkpoint 8 of 8 (pos_min = 29501, pos_max = 29501, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 10688 | | |
| prompt eval time = 3524.40 ms / 1445 tokens ( 2.44 ms per token, 410.00 tokens per second) | |
| eval time = 1867.03 ms / 50 tokens ( 37.34 ms per token, 26.78 tokens per second) | |
| total time = 5391.43 ms / 1495 tokens | |
| slot release: id 0 | task 10688 | stop processing: n_tokens = 30063, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.928 (> 0.100 thold), f_keep = 0.958 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 10740 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 10740 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 31027 | |
| slot update_slots: id 0 | task 10740 | n_past = 28799, slot.prompt.tokens.size() = 30063, seq_id = 0, pos_min = 30062, n_swa = 1 | |
| slot update_slots: id 0 | task 10740 | restored context checkpoint (pos_min = 28568, pos_max = 28568, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 10740 | erased invalidated context checkpoint (pos_min = 29501, pos_max = 29501, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 10740 | n_tokens = 28569, memory_seq_rm [28569, end) | |
| slot update_slots: id 0 | task 10740 | prompt processing progress, n_tokens = 30515, batch.n_tokens = 1946, progress = 0.983498 | |
| slot update_slots: id 0 | task 10740 | n_tokens = 30515, memory_seq_rm [30515, end) | |
| slot update_slots: id 0 | task 10740 | prompt processing progress, n_tokens = 31027, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 10740 | prompt done, n_tokens = 31027, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 10740 | init sampler, took 2.86 ms, tokens: text = 31027, total = 31027 | |
| slot update_slots: id 0 | task 10740 | created context checkpoint 8 of 8 (pos_min = 30514, pos_max = 30514, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 10740 | | |
| prompt eval time = 5942.84 ms / 2458 tokens ( 2.42 ms per token, 413.61 tokens per second) | |
| eval time = 289103.05 ms / 7597 tokens ( 38.05 ms per token, 26.28 tokens per second) | |
| total time = 295045.89 ms / 10055 tokens | |
| slot release: id 0 | task 10740 | stop processing: n_tokens = 38623, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.803 (> 0.100 thold), f_keep = 0.803 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 18339 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 18339 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 38639 | |
| slot update_slots: id 0 | task 18339 | n_past = 31026, slot.prompt.tokens.size() = 38623, seq_id = 0, pos_min = 38622, n_swa = 1 | |
| slot update_slots: id 0 | task 18339 | restored context checkpoint (pos_min = 30514, pos_max = 30514, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18339 | n_tokens = 30515, memory_seq_rm [30515, end) | |
| slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 32563, batch.n_tokens = 2048, progress = 0.842750 | |
| slot update_slots: id 0 | task 18339 | n_tokens = 32563, memory_seq_rm [32563, end) | |
| slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 34611, batch.n_tokens = 2048, progress = 0.895753 | |
| slot update_slots: id 0 | task 18339 | n_tokens = 34611, memory_seq_rm [34611, end) | |
| slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 36659, batch.n_tokens = 2048, progress = 0.948756 | |
| slot update_slots: id 0 | task 18339 | n_tokens = 36659, memory_seq_rm [36659, end) | |
| slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 38127, batch.n_tokens = 1468, progress = 0.986749 | |
| slot update_slots: id 0 | task 18339 | n_tokens = 38127, memory_seq_rm [38127, end) | |
| slot update_slots: id 0 | task 18339 | prompt processing progress, n_tokens = 38639, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 18339 | prompt done, n_tokens = 38639, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 18339 | init sampler, took 5.33 ms, tokens: text = 38639, total = 38639 | |
| slot update_slots: id 0 | task 18339 | erasing old context checkpoint (pos_min = 27261, pos_max = 27261, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18339 | created context checkpoint 8 of 8 (pos_min = 38126, pos_max = 38126, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 18339 | | |
| prompt eval time = 21146.69 ms / 8124 tokens ( 2.60 ms per token, 384.17 tokens per second) | |
| eval time = 8754.44 ms / 223 tokens ( 39.26 ms per token, 25.47 tokens per second) | |
| total time = 29901.12 ms / 8347 tokens | |
| slot release: id 0 | task 18339 | stop processing: n_tokens = 38861, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.793 (> 0.100 thold), f_keep = 0.798 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 18567 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 18567 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 39120 | |
| slot update_slots: id 0 | task 18567 | n_past = 31025, slot.prompt.tokens.size() = 38861, seq_id = 0, pos_min = 38860, n_swa = 1 | |
| slot update_slots: id 0 | task 18567 | restored context checkpoint (pos_min = 30514, pos_max = 30514, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18567 | erased invalidated context checkpoint (pos_min = 38126, pos_max = 38126, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18567 | n_tokens = 30515, memory_seq_rm [30515, end) | |
| slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 32563, batch.n_tokens = 2048, progress = 0.832388 | |
| slot update_slots: id 0 | task 18567 | n_tokens = 32563, memory_seq_rm [32563, end) | |
| slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 34611, batch.n_tokens = 2048, progress = 0.884739 | |
| slot update_slots: id 0 | task 18567 | n_tokens = 34611, memory_seq_rm [34611, end) | |
| slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 36659, batch.n_tokens = 2048, progress = 0.937091 | |
| slot update_slots: id 0 | task 18567 | n_tokens = 36659, memory_seq_rm [36659, end) | |
| slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 38608, batch.n_tokens = 1949, progress = 0.986912 | |
| slot update_slots: id 0 | task 18567 | n_tokens = 38608, memory_seq_rm [38608, end) | |
| slot update_slots: id 0 | task 18567 | prompt processing progress, n_tokens = 39120, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 18567 | prompt done, n_tokens = 39120, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 18567 | init sampler, took 3.83 ms, tokens: text = 39120, total = 39120 | |
| slot update_slots: id 0 | task 18567 | created context checkpoint 8 of 8 (pos_min = 38607, pos_max = 38607, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 18567 | | |
| prompt eval time = 22138.42 ms / 8605 tokens ( 2.57 ms per token, 388.69 tokens per second) | |
| eval time = 1448.88 ms / 36 tokens ( 40.25 ms per token, 24.85 tokens per second) | |
| total time = 23587.30 ms / 8641 tokens | |
| slot release: id 0 | task 18567 | stop processing: n_tokens = 39155, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 96497109660 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 39155, total state size = 828.008 MiB | |
| srv params_from_: Chat format: peg-constructed | |
| srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.030 | |
| srv load: - found better prompt with f_keep = 0.308, sim = 0.820 | |
| srv update: - cache state: 9 prompts, 3222.636 MiB (limits: 8192.000 MiB, 100096 tokens, 247605 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02ab010: 403 tokens, checkpoints: 0, 70.689 MiB | |
| srv update: - prompt 0x9d02aaf10: 39155 tokens, checkpoints: 8, 1330.513 MiB | |
| srv get_availabl: prompt cache update took 595.83 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 18608 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 18608 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 100 | |
| slot update_slots: id 0 | task 18608 | n_past = 82, slot.prompt.tokens.size() = 266, seq_id = 0, pos_min = 265, n_swa = 1 | |
| slot update_slots: id 0 | task 18608 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) | |
| slot update_slots: id 0 | task 18608 | n_tokens = 0, memory_seq_rm [0, end) | |
| slot update_slots: id 0 | task 18608 | prompt processing progress, n_tokens = 100, batch.n_tokens = 100, progress = 1.000000 | |
| slot update_slots: id 0 | task 18608 | prompt done, n_tokens = 100, batch.n_tokens = 100 | |
| slot init_sampler: id 0 | task 18608 | init sampler, took 0.01 ms, tokens: text = 100, total = 100 | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 18608 | | |
| prompt eval time = 1208.04 ms / 100 tokens ( 12.08 ms per token, 82.78 tokens per second) | |
| eval time = 4917.53 ms / 166 tokens ( 29.62 ms per token, 33.76 tokens per second) | |
| total time = 6125.57 ms / 266 tokens | |
| slot release: id 0 | task 18608 | stop processing: n_tokens = 265, truncated = 0 | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99024008913 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 265, total state size = 67.992 MiB | |
| srv load: - looking for better prompt, base f_keep = 0.011, sim = 0.000 | |
| srv load: - found better prompt with f_keep = 0.383, sim = 0.626 | |
| srv update: - cache state: 9 prompts, 1960.115 MiB (limits: 8192.000 MiB, 100096 tokens, 244554 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02ab010: 403 tokens, checkpoints: 0, 70.689 MiB | |
| srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB | |
| srv get_availabl: prompt cache update took 116.23 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 18609 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 18609 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 23968 | |
| slot update_slots: id 0 | task 18609 | n_past = 14999, slot.prompt.tokens.size() = 39155, seq_id = 0, pos_min = 39154, n_swa = 1 | |
| slot update_slots: id 0 | task 18609 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) | |
| slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27630, pos_max = 27630, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27705, pos_max = 27705, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27791, pos_max = 27791, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27882, pos_max = 27882, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 27947, pos_max = 27947, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 28568, pos_max = 28568, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 30514, pos_max = 30514, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18609 | erased invalidated context checkpoint (pos_min = 38607, pos_max = 38607, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18609 | n_tokens = 0, memory_seq_rm [0, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.085447 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 2048, memory_seq_rm [2048, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.170895 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 4096, memory_seq_rm [4096, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.256342 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 6144, memory_seq_rm [6144, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.341789 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 8192, memory_seq_rm [8192, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 10240, batch.n_tokens = 2048, progress = 0.427236 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 10240, memory_seq_rm [10240, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.512684 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 12288, memory_seq_rm [12288, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 14336, batch.n_tokens = 2048, progress = 0.598131 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 14336, memory_seq_rm [14336, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 2048, progress = 0.683578 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 16384, memory_seq_rm [16384, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 18432, batch.n_tokens = 2048, progress = 0.769025 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 18432, memory_seq_rm [18432, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 20480, batch.n_tokens = 2048, progress = 0.854473 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 20480, memory_seq_rm [20480, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 22528, batch.n_tokens = 2048, progress = 0.939920 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 22528, memory_seq_rm [22528, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 23456, batch.n_tokens = 928, progress = 0.978638 | |
| slot update_slots: id 0 | task 18609 | n_tokens = 23456, memory_seq_rm [23456, end) | |
| slot update_slots: id 0 | task 18609 | prompt processing progress, n_tokens = 23968, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 18609 | prompt done, n_tokens = 23968, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 18609 | init sampler, took 2.62 ms, tokens: text = 23968, total = 23968 | |
| slot update_slots: id 0 | task 18609 | created context checkpoint 1 of 8 (pos_min = 23455, pos_max = 23455, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 18609 | | |
| prompt eval time = 44416.72 ms / 23968 tokens ( 1.85 ms per token, 539.62 tokens per second) | |
| eval time = 2685.56 ms / 76 tokens ( 35.34 ms per token, 28.30 tokens per second) | |
| total time = 47102.28 ms / 24044 tokens | |
| slot release: id 0 | task 18609 | stop processing: n_tokens = 24043, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.955 (> 0.100 thold), f_keep = 0.997 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 18864 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 18864 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 25097 | |
| slot update_slots: id 0 | task 18864 | n_past = 23966, slot.prompt.tokens.size() = 24043, seq_id = 0, pos_min = 24042, n_swa = 1 | |
| slot update_slots: id 0 | task 18864 | restored context checkpoint (pos_min = 23455, pos_max = 23455, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18864 | n_tokens = 23456, memory_seq_rm [23456, end) | |
| slot update_slots: id 0 | task 18864 | prompt processing progress, n_tokens = 24585, batch.n_tokens = 1129, progress = 0.979599 | |
| slot update_slots: id 0 | task 18864 | n_tokens = 24585, memory_seq_rm [24585, end) | |
| slot update_slots: id 0 | task 18864 | prompt processing progress, n_tokens = 25097, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 18864 | prompt done, n_tokens = 25097, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 18864 | init sampler, took 5.19 ms, tokens: text = 25097, total = 25097 | |
| slot update_slots: id 0 | task 18864 | created context checkpoint 2 of 8 (pos_min = 24584, pos_max = 24584, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 18864 | | |
| prompt eval time = 3818.16 ms / 1641 tokens ( 2.33 ms per token, 429.79 tokens per second) | |
| eval time = 2619.72 ms / 78 tokens ( 33.59 ms per token, 29.77 tokens per second) | |
| total time = 6437.88 ms / 1719 tokens | |
| slot release: id 0 | task 18864 | stop processing: n_tokens = 25174, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200 | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.618 (> 0.100 thold), f_keep = 0.997 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 18944 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 18944 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 40585 | |
| slot update_slots: id 0 | task 18944 | n_past = 25096, slot.prompt.tokens.size() = 25174, seq_id = 0, pos_min = 25173, n_swa = 1 | |
| slot update_slots: id 0 | task 18944 | restored context checkpoint (pos_min = 24584, pos_max = 24584, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 18944 | n_tokens = 24585, memory_seq_rm [24585, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 26633, batch.n_tokens = 2048, progress = 0.656228 | |
| slot update_slots: id 0 | task 18944 | n_tokens = 26633, memory_seq_rm [26633, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 28681, batch.n_tokens = 2048, progress = 0.706690 | |
| slot update_slots: id 0 | task 18944 | n_tokens = 28681, memory_seq_rm [28681, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 30729, batch.n_tokens = 2048, progress = 0.757152 | |
| slot update_slots: id 0 | task 18944 | n_tokens = 30729, memory_seq_rm [30729, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 32777, batch.n_tokens = 2048, progress = 0.807614 | |
| slot update_slots: id 0 | task 18944 | n_tokens = 32777, memory_seq_rm [32777, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 34825, batch.n_tokens = 2048, progress = 0.858076 | |
| slot update_slots: id 0 | task 18944 | n_tokens = 34825, memory_seq_rm [34825, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 36873, batch.n_tokens = 2048, progress = 0.908538 | |
| slot update_slots: id 0 | task 18944 | n_tokens = 36873, memory_seq_rm [36873, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 38921, batch.n_tokens = 2048, progress = 0.959000 | |
| slot update_slots: id 0 | task 18944 | n_tokens = 38921, memory_seq_rm [38921, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 40073, batch.n_tokens = 1152, progress = 0.987384 | |
| slot update_slots: id 0 | task 18944 | n_tokens = 40073, memory_seq_rm [40073, end) | |
| slot update_slots: id 0 | task 18944 | prompt processing progress, n_tokens = 40585, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 18944 | prompt done, n_tokens = 40585, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 18944 | init sampler, took 6.66 ms, tokens: text = 40585, total = 40585 | |
| slot update_slots: id 0 | task 18944 | created context checkpoint 3 of 8 (pos_min = 40072, pos_max = 40072, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 18944 | | |
| prompt eval time = 40686.51 ms / 16000 tokens ( 2.54 ms per token, 393.25 tokens per second) | |
| eval time = 3115.05 ms / 81 tokens ( 38.46 ms per token, 26.00 tokens per second) | |
| total time = 43801.56 ms / 16081 tokens | |
| slot release: id 0 | task 18944 | stop processing: n_tokens = 40665, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.950 (> 0.100 thold), f_keep = 0.998 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 19034 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 19034 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 42722 | |
| slot update_slots: id 0 | task 19034 | n_past = 40584, slot.prompt.tokens.size() = 40665, seq_id = 0, pos_min = 40664, n_swa = 1 | |
| slot update_slots: id 0 | task 19034 | restored context checkpoint (pos_min = 40072, pos_max = 40072, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 19034 | n_tokens = 40073, memory_seq_rm [40073, end) | |
| slot update_slots: id 0 | task 19034 | prompt processing progress, n_tokens = 42121, batch.n_tokens = 2048, progress = 0.985932 | |
| slot update_slots: id 0 | task 19034 | n_tokens = 42121, memory_seq_rm [42121, end) | |
| slot update_slots: id 0 | task 19034 | prompt processing progress, n_tokens = 42210, batch.n_tokens = 89, progress = 0.988016 | |
| slot update_slots: id 0 | task 19034 | n_tokens = 42210, memory_seq_rm [42210, end) | |
| slot update_slots: id 0 | task 19034 | prompt processing progress, n_tokens = 42722, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 19034 | prompt done, n_tokens = 42722, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 19034 | init sampler, took 6.81 ms, tokens: text = 42722, total = 42722 | |
| slot update_slots: id 0 | task 19034 | created context checkpoint 4 of 8 (pos_min = 42209, pos_max = 42209, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 19034 | | |
| prompt eval time = 7603.79 ms / 2649 tokens ( 2.87 ms per token, 348.38 tokens per second) | |
| eval time = 4877.39 ms / 124 tokens ( 39.33 ms per token, 25.42 tokens per second) | |
| total time = 12481.17 ms / 2773 tokens | |
| slot release: id 0 | task 19034 | stop processing: n_tokens = 42845, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.577 (> 0.100 thold), f_keep = 0.586 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 19161 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 19161 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 43489 | |
| slot update_slots: id 0 | task 19161 | n_past = 25095, slot.prompt.tokens.size() = 42845, seq_id = 0, pos_min = 42844, n_swa = 1 | |
| slot update_slots: id 0 | task 19161 | restored context checkpoint (pos_min = 24584, pos_max = 24584, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 19161 | erased invalidated context checkpoint (pos_min = 40072, pos_max = 40072, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 19161 | erased invalidated context checkpoint (pos_min = 42209, pos_max = 42209, n_swa = 1, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 19161 | n_tokens = 24585, memory_seq_rm [24585, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 26633, batch.n_tokens = 2048, progress = 0.612408 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 26633, memory_seq_rm [26633, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 28681, batch.n_tokens = 2048, progress = 0.659500 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 28681, memory_seq_rm [28681, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 30729, batch.n_tokens = 2048, progress = 0.706593 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 30729, memory_seq_rm [30729, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 32777, batch.n_tokens = 2048, progress = 0.753685 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 32777, memory_seq_rm [32777, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 34825, batch.n_tokens = 2048, progress = 0.800777 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 34825, memory_seq_rm [34825, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 36873, batch.n_tokens = 2048, progress = 0.847870 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 36873, memory_seq_rm [36873, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 38921, batch.n_tokens = 2048, progress = 0.894962 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 38921, memory_seq_rm [38921, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 40969, batch.n_tokens = 2048, progress = 0.942054 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 40969, memory_seq_rm [40969, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 42977, batch.n_tokens = 2008, progress = 0.988227 | |
| slot update_slots: id 0 | task 19161 | n_tokens = 42977, memory_seq_rm [42977, end) | |
| slot update_slots: id 0 | task 19161 | prompt processing progress, n_tokens = 43489, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 19161 | prompt done, n_tokens = 43489, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 19161 | init sampler, took 6.78 ms, tokens: text = 43489, total = 43489 | |
| slot update_slots: id 0 | task 19161 | created context checkpoint 3 of 8 (pos_min = 42976, pos_max = 42976, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 19161 | | |
| prompt eval time = 48729.18 ms / 18904 tokens ( 2.58 ms per token, 387.94 tokens per second) | |
| eval time = 7576.52 ms / 191 tokens ( 39.67 ms per token, 25.21 tokens per second) | |
| total time = 56305.70 ms / 19095 tokens | |
| slot release: id 0 | task 19161 | stop processing: n_tokens = 43679, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.965 (> 0.100 thold), f_keep = 0.996 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 19362 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 19362 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45050 | |
| slot update_slots: id 0 | task 19362 | n_past = 43487, slot.prompt.tokens.size() = 43679, seq_id = 0, pos_min = 43678, n_swa = 1 | |
| slot update_slots: id 0 | task 19362 | restored context checkpoint (pos_min = 42976, pos_max = 42976, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 19362 | n_tokens = 42977, memory_seq_rm [42977, end) | |
| slot update_slots: id 0 | task 19362 | prompt processing progress, n_tokens = 44538, batch.n_tokens = 1561, progress = 0.988635 | |
| slot update_slots: id 0 | task 19362 | n_tokens = 44538, memory_seq_rm [44538, end) | |
| slot update_slots: id 0 | task 19362 | prompt processing progress, n_tokens = 45050, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 19362 | prompt done, n_tokens = 45050, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 19362 | init sampler, took 8.52 ms, tokens: text = 45050, total = 45050 | |
| slot update_slots: id 0 | task 19362 | created context checkpoint 4 of 8 (pos_min = 44537, pos_max = 44537, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 19362 | | |
| prompt eval time = 6200.91 ms / 2073 tokens ( 2.99 ms per token, 334.31 tokens per second) | |
| eval time = 5084.72 ms / 138 tokens ( 36.85 ms per token, 27.14 tokens per second) | |
| total time = 11285.63 ms / 2211 tokens | |
| slot release: id 0 | task 19362 | stop processing: n_tokens = 45187, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.997 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 19502 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 19502 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45172 | |
| slot update_slots: id 0 | task 19502 | n_past = 45049, slot.prompt.tokens.size() = 45187, seq_id = 0, pos_min = 45186, n_swa = 1 | |
| slot update_slots: id 0 | task 19502 | restored context checkpoint (pos_min = 44537, pos_max = 44537, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 19502 | n_tokens = 44538, memory_seq_rm [44538, end) | |
| slot update_slots: id 0 | task 19502 | prompt processing progress, n_tokens = 44660, batch.n_tokens = 122, progress = 0.988666 | |
| slot update_slots: id 0 | task 19502 | n_tokens = 44660, memory_seq_rm [44660, end) | |
| slot update_slots: id 0 | task 19502 | prompt processing progress, n_tokens = 45172, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 19502 | prompt done, n_tokens = 45172, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 19502 | init sampler, took 4.45 ms, tokens: text = 45172, total = 45172 | |
| slot update_slots: id 0 | task 19502 | created context checkpoint 5 of 8 (pos_min = 44659, pos_max = 44659, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 19502 | | |
| prompt eval time = 2006.29 ms / 634 tokens ( 3.16 ms per token, 316.01 tokens per second) | |
| eval time = 2413.69 ms / 67 tokens ( 36.03 ms per token, 27.76 tokens per second) | |
| total time = 4419.98 ms / 701 tokens | |
| slot release: id 0 | task 19502 | stop processing: n_tokens = 45238, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99206848358 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 45238, total state size = 946.886 MiB | |
| srv params_from_: Chat format: peg-constructed | |
| srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.009 | |
| srv load: - found better prompt with f_keep = 0.700, sim = 0.873 | |
| srv update: - cache state: 9 prompts, 3150.377 MiB (limits: 8192.000 MiB, 100096 tokens, 268743 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB | |
| srv update: - prompt 0x9d0dc0010: 45238 tokens, checkpoints: 5, 1260.952 MiB | |
| srv get_availabl: prompt cache update took 328.59 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 19571 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 19571 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 323 | |
| slot update_slots: id 0 | task 19571 | n_past = 282, slot.prompt.tokens.size() = 403, seq_id = 0, pos_min = 402, n_swa = 1 | |
| slot update_slots: id 0 | task 19571 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) | |
| slot update_slots: id 0 | task 19571 | n_tokens = 0, memory_seq_rm [0, end) | |
| slot update_slots: id 0 | task 19571 | prompt processing progress, n_tokens = 323, batch.n_tokens = 323, progress = 1.000000 | |
| slot update_slots: id 0 | task 19571 | prompt done, n_tokens = 323, batch.n_tokens = 323 | |
| slot init_sampler: id 0 | task 19571 | init sampler, took 0.03 ms, tokens: text = 323, total = 323 | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 19571 | | |
| prompt eval time = 576.87 ms / 323 tokens ( 1.79 ms per token, 559.92 tokens per second) | |
| eval time = 6285.42 ms / 210 tokens ( 29.93 ms per token, 33.41 tokens per second) | |
| total time = 6862.29 ms / 533 tokens | |
| slot release: id 0 | task 19571 | stop processing: n_tokens = 532, truncated = 0 | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99216836871 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 532, total state size = 73.210 MiB | |
| srv load: - looking for better prompt, base f_keep = 0.006, sim = 0.000 | |
| srv load: - found better prompt with f_keep = 0.999, sim = 0.998 | |
| srv update: - cache state: 9 prompts, 1962.636 MiB (limits: 8192.000 MiB, 100096 tokens, 244778 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB | |
| srv update: - prompt 0x9d0dc0110: 532 tokens, checkpoints: 0, 73.210 MiB | |
| srv get_availabl: prompt cache update took 57.70 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 19572 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 19572 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45263 | |
| slot update_slots: id 0 | task 19572 | n_past = 45171, slot.prompt.tokens.size() = 45238, seq_id = 0, pos_min = 45237, n_swa = 1 | |
| slot update_slots: id 0 | task 19572 | restored context checkpoint (pos_min = 44659, pos_max = 44659, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 19572 | n_tokens = 44660, memory_seq_rm [44660, end) | |
| slot update_slots: id 0 | task 19572 | prompt processing progress, n_tokens = 44751, batch.n_tokens = 91, progress = 0.988688 | |
| slot update_slots: id 0 | task 19572 | n_tokens = 44751, memory_seq_rm [44751, end) | |
| slot update_slots: id 0 | task 19572 | prompt processing progress, n_tokens = 45263, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 19572 | prompt done, n_tokens = 45263, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 19572 | init sampler, took 4.57 ms, tokens: text = 45263, total = 45263 | |
| slot update_slots: id 0 | task 19572 | created context checkpoint 6 of 8 (pos_min = 44750, pos_max = 44750, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 19572 | | |
| prompt eval time = 1976.89 ms / 603 tokens ( 3.28 ms per token, 305.02 tokens per second) | |
| eval time = 1872.59 ms / 52 tokens ( 36.01 ms per token, 27.77 tokens per second) | |
| total time = 3849.49 ms / 655 tokens | |
| slot release: id 0 | task 19572 | stop processing: n_tokens = 45314, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99220747999 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 45314, total state size = 948.371 MiB | |
| srv params_from_: Chat format: peg-constructed | |
| srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.009 | |
| srv load: - found better prompt with f_keep = 0.534, sim = 0.869 | |
| srv update: - cache state: 9 prompts, 3214.676 MiB (limits: 8192.000 MiB, 100096 tokens, 263561 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB | |
| srv update: - prompt 0x9d0dc0510: 45314 tokens, checkpoints: 6, 1325.250 MiB | |
| srv get_availabl: prompt cache update took 400.13 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 19836 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 19836 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 327 | |
| slot update_slots: id 0 | task 19836 | n_past = 284, slot.prompt.tokens.size() = 532, seq_id = 0, pos_min = 531, n_swa = 1 | |
| slot update_slots: id 0 | task 19836 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) | |
| slot update_slots: id 0 | task 19836 | n_tokens = 0, memory_seq_rm [0, end) | |
| slot update_slots: id 0 | task 19836 | prompt processing progress, n_tokens = 327, batch.n_tokens = 327, progress = 1.000000 | |
| slot update_slots: id 0 | task 19836 | prompt done, n_tokens = 327, batch.n_tokens = 327 | |
| slot init_sampler: id 0 | task 19836 | init sampler, took 0.04 ms, tokens: text = 327, total = 327 | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 19836 | | |
| prompt eval time = 573.07 ms / 327 tokens ( 1.75 ms per token, 570.61 tokens per second) | |
| eval time = 4256.85 ms / 144 tokens ( 29.56 ms per token, 33.83 tokens per second) | |
| total time = 4829.91 ms / 471 tokens | |
| slot release: id 0 | task 19836 | stop processing: n_tokens = 470, truncated = 0 | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99228000822 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 470, total state size = 71.999 MiB | |
| srv load: - looking for better prompt, base f_keep = 0.006, sim = 0.000 | |
| srv load: - found better prompt with f_keep = 0.999, sim = 0.998 | |
| srv update: - cache state: 9 prompts, 1961.424 MiB (limits: 8192.000 MiB, 100096 tokens, 244671 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB | |
| srv update: - prompt 0x9d0dc0110: 470 tokens, checkpoints: 0, 71.999 MiB | |
| srv get_availabl: prompt cache update took 35.73 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 19837 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 19837 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45359 | |
| slot update_slots: id 0 | task 19837 | n_past = 45262, slot.prompt.tokens.size() = 45314, seq_id = 0, pos_min = 45313, n_swa = 1 | |
| slot update_slots: id 0 | task 19837 | restored context checkpoint (pos_min = 44750, pos_max = 44750, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 19837 | n_tokens = 44751, memory_seq_rm [44751, end) | |
| slot update_slots: id 0 | task 19837 | prompt processing progress, n_tokens = 44847, batch.n_tokens = 96, progress = 0.988712 | |
| slot update_slots: id 0 | task 19837 | n_tokens = 44847, memory_seq_rm [44847, end) | |
| slot update_slots: id 0 | task 19837 | prompt processing progress, n_tokens = 45359, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 19837 | prompt done, n_tokens = 45359, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 19837 | init sampler, took 4.70 ms, tokens: text = 45359, total = 45359 | |
| slot update_slots: id 0 | task 19837 | created context checkpoint 7 of 8 (pos_min = 44846, pos_max = 44846, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 19837 | | |
| prompt eval time = 1952.77 ms / 608 tokens ( 3.21 ms per token, 311.35 tokens per second) | |
| eval time = 2258.34 ms / 61 tokens ( 37.02 ms per token, 27.01 tokens per second) | |
| total time = 4211.11 ms / 669 tokens | |
| slot release: id 0 | task 19837 | stop processing: n_tokens = 45419, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99232251113 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 45419, total state size = 950.423 MiB | |
| srv params_from_: Chat format: peg-constructed | |
| srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.009 | |
| srv load: - found better prompt with f_keep = 0.600, sim = 0.801 | |
| srv update: - cache state: 9 prompts, 3279.541 MiB (limits: 8192.000 MiB, 100096 tokens, 258611 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB | |
| srv update: - prompt 0x9d0dc0190: 45419 tokens, checkpoints: 7, 1390.116 MiB | |
| srv get_availabl: prompt cache update took 467.96 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 20044 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 20044 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 352 | |
| slot update_slots: id 0 | task 20044 | n_past = 282, slot.prompt.tokens.size() = 470, seq_id = 0, pos_min = 469, n_swa = 1 | |
| slot update_slots: id 0 | task 20044 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) | |
| slot update_slots: id 0 | task 20044 | n_tokens = 0, memory_seq_rm [0, end) | |
| slot update_slots: id 0 | task 20044 | prompt processing progress, n_tokens = 352, batch.n_tokens = 352, progress = 1.000000 | |
| slot update_slots: id 0 | task 20044 | prompt done, n_tokens = 352, batch.n_tokens = 352 | |
| slot init_sampler: id 0 | task 20044 | init sampler, took 0.04 ms, tokens: text = 352, total = 352 | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 20044 | | |
| prompt eval time = 590.20 ms / 352 tokens ( 1.68 ms per token, 596.41 tokens per second) | |
| eval time = 14112.52 ms / 476 tokens ( 29.65 ms per token, 33.73 tokens per second) | |
| total time = 14702.72 ms / 828 tokens | |
| slot release: id 0 | task 20044 | stop processing: n_tokens = 827, truncated = 0 | |
| slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 99249687834 | |
| srv get_availabl: updating prompt cache | |
| srv prompt_save: - saving prompt with length 827, total state size = 78.975 MiB | |
| srv load: - looking for better prompt, base f_keep = 0.004, sim = 0.000 | |
| srv load: - found better prompt with f_keep = 0.999, sim = 0.997 | |
| srv update: - cache state: 9 prompts, 1968.401 MiB (limits: 8192.000 MiB, 100096 tokens, 245289 est) | |
| srv update: - prompt 0x9d02aac90: 14064 tokens, checkpoints: 1, 400.475 MiB | |
| srv update: - prompt 0x9d02aa810: 20910 tokens, checkpoints: 1, 534.264 MiB | |
| srv update: - prompt 0x9d02aad10: 2897 tokens, checkpoints: 1, 182.242 MiB | |
| srv update: - prompt 0x9d02aa710: 18682 tokens, checkpoints: 1, 490.723 MiB | |
| srv update: - prompt 0x9d02aa910: 370 tokens, checkpoints: 0, 70.044 MiB | |
| srv update: - prompt 0x9d02a9b90: 338 tokens, checkpoints: 0, 69.419 MiB | |
| srv update: - prompt 0x9d02aab10: 586 tokens, checkpoints: 0, 74.265 MiB | |
| srv update: - prompt 0x9d02aae90: 265 tokens, checkpoints: 0, 67.992 MiB | |
| srv update: - prompt 0x9d0dc0110: 827 tokens, checkpoints: 0, 78.975 MiB | |
| srv get_availabl: prompt cache update took 42.09 ms | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 20045 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 20045 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45477 | |
| slot update_slots: id 0 | task 20045 | n_past = 45358, slot.prompt.tokens.size() = 45419, seq_id = 0, pos_min = 45418, n_swa = 1 | |
| slot update_slots: id 0 | task 20045 | restored context checkpoint (pos_min = 44846, pos_max = 44846, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 20045 | n_tokens = 44847, memory_seq_rm [44847, end) | |
| slot update_slots: id 0 | task 20045 | prompt processing progress, n_tokens = 44965, batch.n_tokens = 118, progress = 0.988742 | |
| slot update_slots: id 0 | task 20045 | n_tokens = 44965, memory_seq_rm [44965, end) | |
| slot update_slots: id 0 | task 20045 | prompt processing progress, n_tokens = 45477, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 20045 | prompt done, n_tokens = 45477, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 20045 | init sampler, took 4.94 ms, tokens: text = 45477, total = 45477 | |
| slot update_slots: id 0 | task 20045 | created context checkpoint 8 of 8 (pos_min = 44964, pos_max = 44964, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 20045 | | |
| prompt eval time = 2043.99 ms / 630 tokens ( 3.24 ms per token, 308.22 tokens per second) | |
| eval time = 1456.21 ms / 40 tokens ( 36.41 ms per token, 27.47 tokens per second) | |
| total time = 3500.20 ms / 670 tokens | |
| slot release: id 0 | task 20045 | stop processing: n_tokens = 45516, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 0.999 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 20563 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 20563 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45537 | |
| slot update_slots: id 0 | task 20563 | n_past = 45476, slot.prompt.tokens.size() = 45516, seq_id = 0, pos_min = 45515, n_swa = 1 | |
| slot update_slots: id 0 | task 20563 | restored context checkpoint (pos_min = 44964, pos_max = 44964, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 20563 | n_tokens = 44965, memory_seq_rm [44965, end) | |
| slot update_slots: id 0 | task 20563 | prompt processing progress, n_tokens = 45025, batch.n_tokens = 60, progress = 0.988756 | |
| slot update_slots: id 0 | task 20563 | n_tokens = 45025, memory_seq_rm [45025, end) | |
| slot update_slots: id 0 | task 20563 | prompt processing progress, n_tokens = 45537, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 20563 | prompt done, n_tokens = 45537, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 20563 | init sampler, took 4.22 ms, tokens: text = 45537, total = 45537 | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 20563 | | |
| prompt eval time = 1867.14 ms / 572 tokens ( 3.26 ms per token, 306.35 tokens per second) | |
| eval time = 3641.80 ms / 100 tokens ( 36.42 ms per token, 27.46 tokens per second) | |
| total time = 5508.94 ms / 672 tokens | |
| slot release: id 0 | task 20563 | stop processing: n_tokens = 45636, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 0.998 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 20665 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 20665 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45647 | |
| slot update_slots: id 0 | task 20665 | n_past = 45536, slot.prompt.tokens.size() = 45636, seq_id = 0, pos_min = 45635, n_swa = 1 | |
| slot update_slots: id 0 | task 20665 | restored context checkpoint (pos_min = 44964, pos_max = 44964, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 20665 | n_tokens = 44965, memory_seq_rm [44965, end) | |
| slot update_slots: id 0 | task 20665 | prompt processing progress, n_tokens = 45135, batch.n_tokens = 170, progress = 0.988783 | |
| slot update_slots: id 0 | task 20665 | n_tokens = 45135, memory_seq_rm [45135, end) | |
| slot update_slots: id 0 | task 20665 | prompt processing progress, n_tokens = 45647, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 20665 | prompt done, n_tokens = 45647, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 20665 | init sampler, took 4.27 ms, tokens: text = 45647, total = 45647 | |
| slot update_slots: id 0 | task 20665 | erasing old context checkpoint (pos_min = 23455, pos_max = 23455, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 20665 | created context checkpoint 8 of 8 (pos_min = 45134, pos_max = 45134, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 20665 | | |
| prompt eval time = 2191.72 ms / 682 tokens ( 3.21 ms per token, 311.17 tokens per second) | |
| eval time = 12725.11 ms / 348 tokens ( 36.57 ms per token, 27.35 tokens per second) | |
| total time = 14916.83 ms / 1030 tokens | |
| slot release: id 0 | task 20665 | stop processing: n_tokens = 45994, truncated = 0 | |
| srv update_slots: all slots are idle | |
| srv params_from_: Chat format: peg-constructed | |
| slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 0.992 | |
| slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist | |
| slot launch_slot_: id 0 | task 21015 | processing task, is_child = 0 | |
| slot update_slots: id 0 | task 21015 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 45991 | |
| slot update_slots: id 0 | task 21015 | n_past = 45646, slot.prompt.tokens.size() = 45994, seq_id = 0, pos_min = 45993, n_swa = 1 | |
| slot update_slots: id 0 | task 21015 | restored context checkpoint (pos_min = 45134, pos_max = 45134, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 21015 | n_tokens = 45135, memory_seq_rm [45135, end) | |
| slot update_slots: id 0 | task 21015 | prompt processing progress, n_tokens = 45479, batch.n_tokens = 344, progress = 0.988867 | |
| slot update_slots: id 0 | task 21015 | n_tokens = 45479, memory_seq_rm [45479, end) | |
| slot update_slots: id 0 | task 21015 | prompt processing progress, n_tokens = 45991, batch.n_tokens = 512, progress = 1.000000 | |
| slot update_slots: id 0 | task 21015 | prompt done, n_tokens = 45991, batch.n_tokens = 512 | |
| slot init_sampler: id 0 | task 21015 | init sampler, took 4.34 ms, tokens: text = 45991, total = 45991 | |
| slot update_slots: id 0 | task 21015 | erasing old context checkpoint (pos_min = 24584, pos_max = 24584, size = 62.813 MiB) | |
| slot update_slots: id 0 | task 21015 | created context checkpoint 8 of 8 (pos_min = 45478, pos_max = 45478, size = 62.813 MiB) | |
| srv log_server_r: done request: POST /v1/messages 127.0.0.1 200 | |
| slot print_timing: id 0 | task 21015 | | |
| prompt eval time = 2654.72 ms / 856 tokens ( 3.10 ms per token, 322.44 tokens per second) | |
| eval time = 2077.70 ms / 57 tokens ( 36.45 ms per token, 27.43 tokens per second) | |
| total time = 4732.42 ms / 913 tokens | |
| slot release: id 0 | task 21015 | stop processing: n_tokens = 46047, truncated = 0 | |
| srv update_slots: all slots are idle |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Claude Code with Qwen3.5-35B-A3B, running on llama.cpp