huytd/analysis.md

## analysis.md

      
    Raw
  

              analysis.md
            
          
    Qwen3.5-35B-A3B Inference Session Report

Model: Qwen3.5-35B-A3B-UD-Q4_K_M

Backend: llama.cpp (build 8171, c17dce4f5)

Hardware: NVIDIA GeForce RTX 5060 Ti (16 GB VRAM)

Server: llama-server on http://127.0.0.1:8080

Date: March 3, 2026

1. System Configuration


Parameter
Value


GPU
NVIDIA GeForce RTX 5060 Ti


Compute Capability
12.0 (Blackwell)


VRAM Available at Load
~15,094 MiB


CPU Threads
8 (batch: 8, total system: 16)


Context Length (n_ctx)
128,000 tokens


Batch Size (n_batch)
4,096


Flash Attention
Enabled


Model Architecture
qwen35moe (MoE + SSM hybrid)


Quantization
Q4_K_M (4.58 BPW, 18.48 GiB file)


Model Parameters
34.66 B total (256 experts, 8 active per token)


Memory Allocation


Buffer
Size


CUDA0 model weights
10,670 MiB


CUDA_Host model buffer
7,921 MiB


CPU model buffer
333 MiB


KV Cache (CUDA0)
1,328 MiB


Recurrent State (CUDA0)
63 MiB


CUDA0 compute buffer
1,988 MiB


CUDA_Host compute buffer
1,032 MiB


Note: The model did not fully fit in VRAM. llama.cpp's fit algorithm offloaded 20 of 41 layers to CUDA_Host (pinned RAM), with all 41 layers ultimately GPU-accessible through the hybrid layout.


2. Session Summary


Metric
Value


Total Requests
71


Total Output Tokens Generated
13,410


Total Prompt Tokens Processed (with KV cache)
124,684


Total Prompt Tokens (hypothetical, no cache)
3,032,651


Total Tokens Processed (with cache)
138,094


Total Tokens (hypothetical, no cache)
3,046,061


3. KV Cache Performance

The session involved a long, growing multi-turn conversation. llama.cpp's prompt cache system stored up to 8 rolling checkpoints, allowing new requests to skip re-processing the shared prefix.


Metric
Value


Tokens saved by KV cache
2,907,967


Cache hit rate
95.9%


Compute reduction factor
24.3×


The cache hit rate was exceptionally high because each request shared the vast majority of its prefix with the previous one — the conversation context grew incrementally from ~29K to ~53K tokens, adding only hundreds to low thousands of new tokens per turn.
Two full cache invalidations occurred during the session (tasks 5177 and 6461), where the LCP (longest common prefix) similarity dropped enough that all cached checkpoints were invalidated and the full prompt had to be reprocessed from scratch. These are visible as large prompt eval times (~19–20 seconds) matching the initial cold-start request.

4. Token Throughput

Prompt Evaluation Speed (Prefill)

Prompt evaluation speed varied based on whether cached tokens were reused:


Scenario
Speed


Cold start (full 29K prompt)
~1,484–1,509 tokens/sec


Warm cache (incremental new tokens)
~500–1,100 tokens/sec


Very small incremental batches (<100 tokens)
~100–400 tokens/sec (overhead-dominated)


Cold-start prefill is fast because the large batch (4,096 tokens) saturates GPU throughput. Small incremental batches are slower per-token due to fixed kernel launch overhead.
Token Generation Speed (Decode)


Metric
Value


Average generation speed
~30–35 tokens/sec


Minimum observed
~29.5 tokens/sec


Maximum observed
~36.1 tokens/sec


Generation speed was consistent throughout the session, showing no significant degradation even as the KV cache filled up to ~53K tokens. This is expected for a single-sequence, single-slot configuration.

5. Request-by-Request Breakdown

The table below shows every request with its prompt tokens actually processed (with cache), the full context size it was based on, and tokens generated.


Task ID
Prompt Tokens Processed
Full Context Size
Output Tokens
Cache Hit Rate


0
28,872
28,872
64
0.0% (cold)


72
752
29,112
35
97.4%


109
4,067
32,667
34
87.5%


145
1,517
34,217
35
95.6%


182
1,887
36,138
34
94.8%


218
673
36,844
293
98.2%


513
852
37,184
534
97.7%


1049
1,540
38,212
1,066
96.0%


2117
1,905
39,605
59
95.2%


2178
2,144
41,807
1,147
94.9%


3327
1,693
42,988
124
96.1%


3453
802
43,278
576
98.1%


4031
1,442
44,208
183
96.7%


4216
743
44,439
119
98.3%


4337
937
44,864
35
97.9%


4374
1,749
46,647
125
96.3%


4501
670
46,805
111
98.6%


4614
670
46,963
99
98.6%


4715
658
47,109
65
98.6%


4782
607
47,204
62
98.7%


4846
411
47,676
160
99.1%


5007
726
47,991
168
98.5%


5177
29,006
29,006
32
0.0% (invalidated)


5217
1,657
30,151
31
94.5%


5250
1,624
31,263
31
94.8%


5283
562
31,313
29
98.2%


5314
647
31,398
34
97.9%


5350
441
31,872
35
98.6%


5386
1,749
33,655
34
94.8%


5422
1,487
35,175
34
95.8%


5458
1,914
37,122
211
94.8%


5671
190
37,522
204
99.5%


5876
91
37,816
171
99.8%


6048
76
38,062
412
99.8%


6461
7,564
38,450
79
80.3% (partial invalidation)


6543
31
38,559
54
99.9%


6598
17
38,629
667
100.0%


7266
732
39,344
85
98.1%


7353
663
39,495
38
98.3%


7393
23
39,555
31
99.9%


7425
2,493
40,431
1,919
93.8%


9346
2,480
42,399
77
94.2%


9425
78
42,553
88
99.8%


9514
2,724
42,643
84
93.6%


9600
31
42,757
32
99.9%


9633
3,080
45,211
270
93.2%


9905
792
45,491
27
98.3%


9934
57
45,574
200
99.9%


10135
304
45,821
176
99.3%


10312
649
46,166
72
98.6%


10386
646
46,883
97
98.6%


10485
956
47,327
48
98.0%


10535
23
47,397
88
100.0%


10624
32
47,516
79
99.9%


10704
159
47,643
266
99.7%


10971
615
48,099
66
98.7%


11039
168
48,332
80
99.7%


11120
33
48,444
348
99.9%


11469
414
48,825
174
99.2%


11644
991
49,402
497
98.0%


12143
1,055
49,945
48
97.9%


12193
760
50,752
539
98.5%


12734
1,084
51,324
48
97.9%


12784
179
51,550
147
99.7%


12932
532
51,903
44
99.0%


12978
200
52,146
54
99.6%


13033
23
52,222
125
100.0%


13159
58
52,404
59
99.9%


13219
235
52,697
54
99.6%


13274
21
52,771
15
100.0%


13290
21
52,806
249
100.0%


TOTAL
124,684
3,032,651
13,410
95.9%


6. Context Growth Over Session

The conversation context grew steadily across two distinct conversation threads visible in the log:
Thread 1 (tasks 0 – 5007): ~29K → ~48K tokens

Thread 2 (tasks 5177 – 5007, new session): ~29K → ~53K tokens (final)
The two full invalidation events at tasks 5177 and 6461 indicate either a topic switch with a substantially different system prompt/context, or a client reconnect that caused llama.cpp to determine it could not reuse its cached state at sufficient similarity (LCP similarity dropped to 0.908 and 0.813 respectively, below the threshold for effective reuse).

7. Notable Observations


Blackwell GPU (RTX 5060 Ti): This is one of the first reported inference runs on NVIDIA's Blackwell architecture under llama.cpp, with native FP4 support (BLACKWELL_NATIVE_FP4 = 1) and compute capability 12.0.
Hybrid MoE+SSM architecture: Qwen3.5 uses a novel hybrid of MoE (256 experts, 8 active) and SSM (Mamba-style recurrent) layers, reflected in the separate llama_memory_recurrent allocation (62.81 MiB for recurrent state).
Checkpoint eviction strategy: llama.cpp maintained a rolling window of 8 checkpoints and evicted the oldest on overflow, which kept cache reuse high while bounding memory overhead to 8 × 62.8 MiB = ~502 MiB.
Generation speed stability: Output token generation stayed consistently at ~30–35 tok/s across the entire session despite the context growing from 29K to 53K tokens, suggesting KV cache access time is not significantly impacting generation at this context length.


Report generated from llama-server log output. Token counts extracted from prompt eval time and eval time timing lines, and full context sizes from init sampler lines.

  
## llama-server.logs.txt
qwen3.5-35b % ls
Qwen3.5-35B-A3B-UD-Q2_K_XL.gguf  Qwen3.5-35B-A3B-UD-Q4_K_M.gguf  run
qwen3.5-35b % ./run Qwen3.5-35B-A3B-UD-Q4_K_M.gguf
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
build: 8171 (c17dce4f5) with GNU 13.3.0 for Linux x86_64
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16

system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | FA_ALL_QUANTS = 1 | BLACKWELL_NATIVE_FP4 = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

Running without SSL
init: using 15 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model 'Qwen3.5-35B-A3B-UD-Q4_K_M.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 21953 MiB of device memory vs. 15094 MiB of free device memory
llama_params_fit_impl: cannot meet free memory target of 1024 MiB, need to reduce device memory by 7883 MiB
llama_params_fit_impl: context size set by user to 128000 -> no change
llama_params_fit_impl: with only dense weights in device memory there is a total surplus of 9396 MiB
llama_params_fit_impl: filling dense-only layers back-to-front:
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 5060 Ti): 41 layers,   5105 MiB used,   9988 MiB free
llama_params_fit_impl: converting dense-only layers to full layers and filling them front-to-back with overflow to next device/system memory:
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 5060 Ti): 41 layers (20 overflowing),  14049 MiB used,   1044 MiB free
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 6.49 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5060 Ti) (0000:01:00.0) - 15158 MiB free
llama_model_loader: loaded meta data with 52 key-value pairs and 733 tensors from Qwen3.5-35B-A3B-UD-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen35moe
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 20
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   5:                               general.name str              = Qwen3.5-35B-A3B
llama_model_loader: - kv   6:                           general.basename str              = Qwen3.5-35B-A3B
llama_model_loader: - kv   7:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   8:                         general.size_label str              = 35B-A3B
llama_model_loader: - kv   9:                            general.license str              = apache-2.0
llama_model_loader: - kv  10:                       general.license.link str              = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv  11:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
llama_model_loader: - kv  13:                  general.base_model.0.name str              = Qwen3.5 35B A3B
llama_model_loader: - kv  14:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  15:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv  16:                               general.tags arr[str,2]       = ["unsloth", "image-text-to-text"]
llama_model_loader: - kv  17:                      qwen35moe.block_count u32              = 40
llama_model_loader: - kv  18:                   qwen35moe.context_length u32              = 262144
llama_model_loader: - kv  19:                 qwen35moe.embedding_length u32              = 2048
llama_model_loader: - kv  20:             qwen35moe.attention.head_count u32              = 16
llama_model_loader: - kv  21:          qwen35moe.attention.head_count_kv u32              = 2
llama_model_loader: - kv  22:          qwen35moe.rope.dimension_sections arr[i32,4]       = [11, 11, 10, 0]
llama_model_loader: - kv  23:                   qwen35moe.rope.freq_base f32              = 10000000.000000
llama_model_loader: - kv  24: qwen35moe.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  25:                     qwen35moe.expert_count u32              = 256
llama_model_loader: - kv  26:                qwen35moe.expert_used_count u32              = 8
llama_model_loader: - kv  27:             qwen35moe.attention.key_length u32              = 256
llama_model_loader: - kv  28:           qwen35moe.attention.value_length u32              = 256
llama_model_loader: - kv  29:       qwen35moe.expert_feed_forward_length u32              = 512
llama_model_loader: - kv  30: qwen35moe.expert_shared_feed_forward_length u32              = 512
llama_model_loader: - kv  31:                  qwen35moe.ssm.conv_kernel u32              = 4
llama_model_loader: - kv  32:                   qwen35moe.ssm.state_size u32              = 128
llama_model_loader: - kv  33:                  qwen35moe.ssm.group_count u32              = 16
llama_model_loader: - kv  34:               qwen35moe.ssm.time_step_rank u32              = 32
llama_model_loader: - kv  35:                   qwen35moe.ssm.inner_size u32              = 4096
llama_model_loader: - kv  36:          qwen35moe.full_attention_interval u32              = 4
llama_model_loader: - kv  37:             qwen35moe.rope.dimension_count u32              = 64
llama_model_loader: - kv  38:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  39:                         tokenizer.ggml.pre str              = qwen35
llama_model_loader: - kv  40:                      tokenizer.ggml.tokens arr[str,248320]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  41:                  tokenizer.ggml.token_type arr[i32,248320]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  42:                      tokenizer.ggml.merges arr[str,247587]  = ["Ä  Ä ", "Ä Ä  Ä Ä ", "i n", "Ä  t",...
llama_model_loader: - kv  43:                tokenizer.ggml.eos_token_id u32              = 248046
llama_model_loader: - kv  44:            tokenizer.ggml.padding_token_id u32              = 248055
llama_model_loader: - kv  45:                    tokenizer.chat_template str              = {%- set image_count = namespace(value...
llama_model_loader: - kv  46:               general.quantization_version u32              = 2
llama_model_loader: - kv  47:                          general.file_type u32              = 7
llama_model_loader: - kv  48:                      quantize.imatrix.file str              = Qwen3.5-35B-A3B-GGUF/Qwen_Qwen3.5-35B...
llama_model_loader: - kv  49:                   quantize.imatrix.dataset str              = /training_dir/calibration_datav5.txt
llama_model_loader: - kv  50:             quantize.imatrix.entries_count u32              = 510
llama_model_loader: - kv  51:              quantize.imatrix.chunks_count u32              = 802
llama_model_loader: - type  f32:  301 tensors
llama_model_loader: - type q4_K:  190 tensors
llama_model_loader: - type q5_K:   82 tensors
llama_model_loader: - type q6_K:   80 tensors
llama_model_loader: - type bf16:   80 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 18.48 GiB (4.58 BPW)
load: 0 unused tokens
load: printing all EOG tokens:
load:   - 248044 ('<|endoftext|>')
load:   - 248046 ('<|im_end|>')
load:   - 248063 ('<|fim_pad|>')
load:   - 248064 ('<|repo_name|>')
load:   - 248065 ('<|file_sep|>')
load: special tokens cache size = 33
load: token to piece cache size = 1.7581 MB
print_info: arch                  = qwen35moe
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 262144
print_info: n_embd                = 2048
print_info: n_embd_inp            = 2048
print_info: n_layer               = 40
print_info: n_head                = 16
print_info: n_head_kv             = 2
print_info: n_rot                 = 64
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 256
print_info: n_embd_head_v         = 256
print_info: n_gqa                 = 8
print_info: n_embd_k_gqa          = 512
print_info: n_embd_v_gqa          = 512
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-06
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: n_ff                  = 0
print_info: n_expert              = 256
print_info: n_expert_used         = 8
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = 0
print_info: rope type             = 40
print_info: rope scaling          = linear
print_info: freq_base_train       = 10000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 262144
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: mrope sections        = [11, 11, 10, 0]
print_info: ssm_d_conv            = 4
print_info: ssm_d_inner           = 4096
print_info: ssm_d_state           = 128
print_info: ssm_dt_rank           = 32
print_info: ssm_n_group           = 16
print_info: ssm_dt_b_c_rms        = 0
print_info: model type            = ?B
print_info: model params          = 34.66 B
print_info: general.name          = Qwen3.5-35B-A3B
print_info: vocab type            = BPE
print_info: n_vocab               = 248320
print_info: n_merges              = 247587
print_info: BOS token             = 11 ','
print_info: EOS token             = 248046 '<|im_end|>'
print_info: EOT token             = 248046 '<|im_end|>'
print_info: PAD token             = 248055 '<|vision_pad|>'
print_info: LF token              = 198 'ÄŠ'
print_info: FIM PRE token         = 248060 '<|fim_prefix|>'
print_info: FIM SUF token         = 248062 '<|fim_suffix|>'
print_info: FIM MID token         = 248061 '<|fim_middle|>'
print_info: FIM PAD token         = 248063 '<|fim_pad|>'
print_info: FIM REP token         = 248064 '<|repo_name|>'
print_info: FIM SEP token         = 248065 '<|file_sep|>'
print_info: EOG token             = 248044 '<|endoftext|>'
print_info: EOG token             = 248046 '<|im_end|>'
print_info: EOG token             = 248063 '<|fim_pad|>'
print_info: EOG token             = 248064 '<|repo_name|>'
print_info: EOG token             = 248065 '<|file_sep|>'
print_info: max token length      = 256
load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 39 repeating layers to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors:          CPU model buffer size =   333.44 MiB
load_tensors:        CUDA0 model buffer size = 10670.30 MiB
load_tensors:    CUDA_Host model buffer size =  7920.69 MiB
...................................................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
common_init_result: added <|fim_pad|> logit bias = -inf
common_init_result: added <|repo_name|> logit bias = -inf
common_init_result: added <|file_sep|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 128000
llama_context: n_ctx_seq     = 128000
llama_context: n_batch       = 4096
llama_context: n_ubatch      = 2048
llama_context: causal_attn   = 1
llama_context: flash_attn    = enabled
llama_context: kv_unified    = false
llama_context: freq_base     = 10000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (128000) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     0.95 MiB
llama_kv_cache:      CUDA0 KV buffer size =  1328.12 MiB
llama_kv_cache: size = 1328.12 MiB (128000 cells,  10 layers,  1/1 seqs), K (q8_0):  664.06 MiB, V (q8_0):  664.06 MiB
llama_memory_recurrent:      CUDA0 RS buffer size =    62.81 MiB
llama_memory_recurrent: size =   62.81 MiB (     1 cells,  40 layers,  1 seqs), R (f32):    2.81 MiB, S (f32):   60.00 MiB
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =  1988.01 MiB
sched_reserve:  CUDA_Host compute buffer size =  1032.08 MiB
sched_reserve: graph nodes  = 20199 (with bs=2048), 4389 (with bs=1)
sched_reserve: graph splits = 58 (with bs=2048), 42 (with bs=1)
sched_reserve: reserve took 1286.57 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
srv    load_model: initializing slots, n_slots = 1
common_speculative_is_compat: the target context does not support partial sequence removal
srv    load_model: speculative decoding not supported by this context
slot   load_model: id  0 | task -1 | new slot, n_ctx = 128000
srv    load_model: prompt cache is enabled, size limit: 8192 MiB
srv    load_model: use `--cache-ram 0` to disable the prompt cache
srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
init: chat template, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
<think>

</think>

'
srv          init: init: chat template, thinking = 1
main: model loaded
main: server is listening on http://127.0.0.1:8080
main: starting the main loop...
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 0 | processing task, is_child = 0
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 28872
slot update_slots: id  0 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 4096, progress = 0.141868
slot update_slots: id  0 | task 0 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 4096, progress = 0.283735
slot update_slots: id  0 | task 0 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 4096, progress = 0.425603
slot update_slots: id  0 | task 0 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 4096, progress = 0.567470
slot update_slots: id  0 | task 0 | n_tokens = 16384, memory_seq_rm [16384, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 20480, batch.n_tokens = 4096, progress = 0.709338
slot update_slots: id  0 | task 0 | n_tokens = 20480, memory_seq_rm [20480, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 24576, batch.n_tokens = 4096, progress = 0.851205
slot update_slots: id  0 | task 0 | n_tokens = 24576, memory_seq_rm [24576, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 28360, batch.n_tokens = 3784, progress = 0.982267
slot update_slots: id  0 | task 0 | n_tokens = 28360, memory_seq_rm [28360, end)
slot init_sampler: id  0 | task 0 | init sampler, took 3.74 ms, tokens: text = 28872, total = 28872
slot update_slots: id  0 | task 0 | created context checkpoint 1 of 8 (pos_min = 28359, pos_max = 28359, n_tokens = 28360, size = 62.813 MiB)
slot update_slots: id  0 | task 0 | prompt processing done, n_tokens = 28872, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 0 |
prompt eval time =   19456.51 ms / 28872 tokens (    0.67 ms per token,  1483.92 tokens per second)
       eval time =    1819.38 ms /    64 tokens (   28.43 ms per token,    35.18 tokens per second)
      total time =   21275.89 ms / 28936 tokens
slot      release: id  0 | task 0 | stop processing: n_tokens = 28935, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 72 | processing task, is_child = 0
slot update_slots: id  0 | task 72 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 29112
slot update_slots: id  0 | task 72 | n_past = 28924, slot.prompt.tokens.size() = 28935, seq_id = 0, pos_min = 28934, n_swa = 1
slot update_slots: id  0 | task 72 | restored context checkpoint (pos_min = 28359, pos_max = 28359, n_tokens = 28360, size = 62.813 MiB)
slot update_slots: id  0 | task 72 | n_tokens = 28360, memory_seq_rm [28360, end)
slot update_slots: id  0 | task 72 | prompt processing progress, n_tokens = 28600, batch.n_tokens = 240, progress = 0.982413
slot update_slots: id  0 | task 72 | n_tokens = 28600, memory_seq_rm [28600, end)
slot init_sampler: id  0 | task 72 | init sampler, took 3.55 ms, tokens: text = 29112, total = 29112
slot update_slots: id  0 | task 72 | created context checkpoint 2 of 8 (pos_min = 28599, pos_max = 28599, n_tokens = 28600, size = 62.813 MiB)
slot update_slots: id  0 | task 72 | prompt processing done, n_tokens = 29112, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 72 |
prompt eval time =    1112.74 ms /   752 tokens (    1.48 ms per token,   675.81 tokens per second)
       eval time =    1003.99 ms /    35 tokens (   28.69 ms per token,    34.86 tokens per second)
      total time =    2116.73 ms /   787 tokens
slot      release: id  0 | task 72 | stop processing: n_tokens = 29146, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.892 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 109 | processing task, is_child = 0
slot update_slots: id  0 | task 109 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 32667
slot update_slots: id  0 | task 109 | n_past = 29145, slot.prompt.tokens.size() = 29146, seq_id = 0, pos_min = 29145, n_swa = 1
slot update_slots: id  0 | task 109 | restored context checkpoint (pos_min = 28599, pos_max = 28599, n_tokens = 28600, size = 62.813 MiB)
slot update_slots: id  0 | task 109 | n_tokens = 28600, memory_seq_rm [28600, end)
slot update_slots: id  0 | task 109 | prompt processing progress, n_tokens = 32155, batch.n_tokens = 3555, progress = 0.984327
slot update_slots: id  0 | task 109 | n_tokens = 32155, memory_seq_rm [32155, end)
slot init_sampler: id  0 | task 109 | init sampler, took 3.85 ms, tokens: text = 32667, total = 32667
slot update_slots: id  0 | task 109 | created context checkpoint 3 of 8 (pos_min = 32154, pos_max = 32154, n_tokens = 32155, size = 62.813 MiB)
slot update_slots: id  0 | task 109 | prompt processing done, n_tokens = 32667, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 109 |
prompt eval time =    3297.57 ms /  4067 tokens (    0.81 ms per token,  1233.33 tokens per second)
       eval time =     968.07 ms /    34 tokens (   28.47 ms per token,    35.12 tokens per second)
      total time =    4265.64 ms /  4101 tokens
slot      release: id  0 | task 109 | stop processing: n_tokens = 32700, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.956 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 145 | processing task, is_child = 0
slot update_slots: id  0 | task 145 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 34217
slot update_slots: id  0 | task 145 | n_tokens = 32700, memory_seq_rm [32700, end)
slot update_slots: id  0 | task 145 | prompt processing progress, n_tokens = 33705, batch.n_tokens = 1005, progress = 0.985037
slot update_slots: id  0 | task 145 | n_tokens = 33705, memory_seq_rm [33705, end)
slot init_sampler: id  0 | task 145 | init sampler, took 4.22 ms, tokens: text = 34217, total = 34217
slot update_slots: id  0 | task 145 | created context checkpoint 4 of 8 (pos_min = 33704, pos_max = 33704, n_tokens = 33705, size = 62.813 MiB)
slot update_slots: id  0 | task 145 | prompt processing done, n_tokens = 34217, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 145 |
prompt eval time =    1630.48 ms /  1517 tokens (    1.07 ms per token,   930.40 tokens per second)
       eval time =    1012.78 ms /    35 tokens (   28.94 ms per token,    34.56 tokens per second)
      total time =    2643.25 ms /  1552 tokens
slot      release: id  0 | task 145 | stop processing: n_tokens = 34251, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.948 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 182 | processing task, is_child = 0
slot update_slots: id  0 | task 182 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 36138
slot update_slots: id  0 | task 182 | n_tokens = 34251, memory_seq_rm [34251, end)
slot update_slots: id  0 | task 182 | prompt processing progress, n_tokens = 35626, batch.n_tokens = 1375, progress = 0.985832
slot update_slots: id  0 | task 182 | n_tokens = 35626, memory_seq_rm [35626, end)
slot init_sampler: id  0 | task 182 | init sampler, took 4.34 ms, tokens: text = 36138, total = 36138
slot update_slots: id  0 | task 182 | created context checkpoint 5 of 8 (pos_min = 35625, pos_max = 35625, n_tokens = 35626, size = 62.813 MiB)
slot update_slots: id  0 | task 182 | prompt processing done, n_tokens = 36138, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 182 |
prompt eval time =    1857.75 ms /  1887 tokens (    0.98 ms per token,  1015.75 tokens per second)
       eval time =     966.94 ms /    34 tokens (   28.44 ms per token,    35.16 tokens per second)
      total time =    2824.69 ms /  1921 tokens
slot      release: id  0 | task 182 | stop processing: n_tokens = 36171, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.982 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 218 | processing task, is_child = 0
slot update_slots: id  0 | task 218 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 36844
slot update_slots: id  0 | task 218 | n_tokens = 36171, memory_seq_rm [36171, end)
slot update_slots: id  0 | task 218 | prompt processing progress, n_tokens = 36332, batch.n_tokens = 161, progress = 0.986104
slot update_slots: id  0 | task 218 | n_tokens = 36332, memory_seq_rm [36332, end)
slot init_sampler: id  0 | task 218 | init sampler, took 4.45 ms, tokens: text = 36844, total = 36844
slot update_slots: id  0 | task 218 | created context checkpoint 6 of 8 (pos_min = 36331, pos_max = 36331, n_tokens = 36332, size = 62.813 MiB)
slot update_slots: id  0 | task 218 | prompt processing done, n_tokens = 36844, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 218 |
prompt eval time =    1130.78 ms /   673 tokens (    1.68 ms per token,   595.17 tokens per second)
       eval time =    8718.12 ms /   293 tokens (   29.75 ms per token,    33.61 tokens per second)
      total time =    9848.90 ms /   966 tokens
slot      release: id  0 | task 218 | stop processing: n_tokens = 37136, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 0.993
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 513 | processing task, is_child = 0
slot update_slots: id  0 | task 513 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 37184
slot update_slots: id  0 | task 513 | n_past = 36872, slot.prompt.tokens.size() = 37136, seq_id = 0, pos_min = 37135, n_swa = 1
slot update_slots: id  0 | task 513 | restored context checkpoint (pos_min = 36331, pos_max = 36331, n_tokens = 36332, size = 62.813 MiB)
slot update_slots: id  0 | task 513 | n_tokens = 36332, memory_seq_rm [36332, end)
slot update_slots: id  0 | task 513 | prompt processing progress, n_tokens = 36672, batch.n_tokens = 340, progress = 0.986231
slot update_slots: id  0 | task 513 | n_tokens = 36672, memory_seq_rm [36672, end)
slot init_sampler: id  0 | task 513 | init sampler, took 4.49 ms, tokens: text = 37184, total = 37184
slot update_slots: id  0 | task 513 | created context checkpoint 7 of 8 (pos_min = 36671, pos_max = 36671, n_tokens = 36672, size = 62.813 MiB)
slot update_slots: id  0 | task 513 | prompt processing done, n_tokens = 37184, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 513 |
prompt eval time =    1427.06 ms /   852 tokens (    1.67 ms per token,   597.03 tokens per second)
       eval time =   16227.11 ms /   534 tokens (   30.39 ms per token,    32.91 tokens per second)
      total time =   17654.17 ms /  1386 tokens
slot      release: id  0 | task 513 | stop processing: n_tokens = 37717, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.973 (> 0.100 thold), f_keep = 0.986
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 1049 | processing task, is_child = 0
slot update_slots: id  0 | task 1049 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38212
slot update_slots: id  0 | task 1049 | n_past = 37194, slot.prompt.tokens.size() = 37717, seq_id = 0, pos_min = 37716, n_swa = 1
slot update_slots: id  0 | task 1049 | restored context checkpoint (pos_min = 36671, pos_max = 36671, n_tokens = 36672, size = 62.813 MiB)
slot update_slots: id  0 | task 1049 | n_tokens = 36672, memory_seq_rm [36672, end)
slot update_slots: id  0 | task 1049 | prompt processing progress, n_tokens = 37700, batch.n_tokens = 1028, progress = 0.986601
slot update_slots: id  0 | task 1049 | n_tokens = 37700, memory_seq_rm [37700, end)
slot init_sampler: id  0 | task 1049 | init sampler, took 4.63 ms, tokens: text = 38212, total = 38212
slot update_slots: id  0 | task 1049 | created context checkpoint 8 of 8 (pos_min = 37699, pos_max = 37699, n_tokens = 37700, size = 62.813 MiB)
slot update_slots: id  0 | task 1049 | prompt processing done, n_tokens = 38212, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 1049 |
prompt eval time =    1723.51 ms /  1540 tokens (    1.12 ms per token,   893.52 tokens per second)
       eval time =   31957.27 ms /  1066 tokens (   29.98 ms per token,    33.36 tokens per second)
      total time =   33680.78 ms /  2606 tokens
slot      release: id  0 | task 1049 | stop processing: n_tokens = 39277, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.966 (> 0.100 thold), f_keep = 0.974
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 2117 | processing task, is_child = 0
slot update_slots: id  0 | task 2117 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 39605
slot update_slots: id  0 | task 2117 | n_past = 38239, slot.prompt.tokens.size() = 39277, seq_id = 0, pos_min = 39276, n_swa = 1
slot update_slots: id  0 | task 2117 | restored context checkpoint (pos_min = 37699, pos_max = 37699, n_tokens = 37700, size = 62.813 MiB)
slot update_slots: id  0 | task 2117 | n_tokens = 37700, memory_seq_rm [37700, end)
slot update_slots: id  0 | task 2117 | prompt processing progress, n_tokens = 39093, batch.n_tokens = 1393, progress = 0.987072
slot update_slots: id  0 | task 2117 | n_tokens = 39093, memory_seq_rm [39093, end)
slot init_sampler: id  0 | task 2117 | init sampler, took 4.77 ms, tokens: text = 39605, total = 39605
slot update_slots: id  0 | task 2117 | erasing old context checkpoint (pos_min = 28359, pos_max = 28359, n_tokens = 28360, size = 62.813 MiB)
slot update_slots: id  0 | task 2117 | created context checkpoint 8 of 8 (pos_min = 39092, pos_max = 39092, n_tokens = 39093, size = 62.813 MiB)
slot update_slots: id  0 | task 2117 | prompt processing done, n_tokens = 39605, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 2117 |
prompt eval time =    1990.02 ms /  1905 tokens (    1.04 ms per token,   957.28 tokens per second)
       eval time =    1762.69 ms /    59 tokens (   29.88 ms per token,    33.47 tokens per second)
      total time =    3752.71 ms /  1964 tokens
slot      release: id  0 | task 2117 | stop processing: n_tokens = 39663, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.949 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 2178 | processing task, is_child = 0
slot update_slots: id  0 | task 2178 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 41807
slot update_slots: id  0 | task 2178 | n_tokens = 39663, memory_seq_rm [39663, end)
slot update_slots: id  0 | task 2178 | prompt processing progress, n_tokens = 41295, batch.n_tokens = 1632, progress = 0.987753
slot update_slots: id  0 | task 2178 | n_tokens = 41295, memory_seq_rm [41295, end)
slot init_sampler: id  0 | task 2178 | init sampler, took 4.98 ms, tokens: text = 41807, total = 41807
slot update_slots: id  0 | task 2178 | erasing old context checkpoint (pos_min = 28599, pos_max = 28599, n_tokens = 28600, size = 62.813 MiB)
slot update_slots: id  0 | task 2178 | created context checkpoint 8 of 8 (pos_min = 41294, pos_max = 41294, n_tokens = 41295, size = 62.813 MiB)
slot update_slots: id  0 | task 2178 | prompt processing done, n_tokens = 41807, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 2178 |
prompt eval time =    2061.89 ms /  2144 tokens (    0.96 ms per token,  1039.82 tokens per second)
       eval time =   38943.47 ms /  1147 tokens (   33.95 ms per token,    29.45 tokens per second)
      total time =   41005.35 ms /  3291 tokens
slot      release: id  0 | task 2178 | stop processing: n_tokens = 42953, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 3327 | processing task, is_child = 0
slot update_slots: id  0 | task 3327 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42988
slot update_slots: id  0 | task 3327 | n_past = 42943, slot.prompt.tokens.size() = 42953, seq_id = 0, pos_min = 42952, n_swa = 1
slot update_slots: id  0 | task 3327 | restored context checkpoint (pos_min = 41294, pos_max = 41294, n_tokens = 41295, size = 62.813 MiB)
slot update_slots: id  0 | task 3327 | n_tokens = 41295, memory_seq_rm [41295, end)
slot update_slots: id  0 | task 3327 | prompt processing progress, n_tokens = 42476, batch.n_tokens = 1181, progress = 0.988090
slot update_slots: id  0 | task 3327 | n_tokens = 42476, memory_seq_rm [42476, end)
slot init_sampler: id  0 | task 3327 | init sampler, took 5.37 ms, tokens: text = 42988, total = 42988
slot update_slots: id  0 | task 3327 | erasing old context checkpoint (pos_min = 32154, pos_max = 32154, n_tokens = 32155, size = 62.813 MiB)
slot update_slots: id  0 | task 3327 | created context checkpoint 8 of 8 (pos_min = 42475, pos_max = 42475, n_tokens = 42476, size = 62.813 MiB)
slot update_slots: id  0 | task 3327 | prompt processing done, n_tokens = 42988, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 3327 |
prompt eval time =    1846.27 ms /  1693 tokens (    1.09 ms per token,   916.98 tokens per second)
       eval time =    3719.40 ms /   124 tokens (   30.00 ms per token,    33.34 tokens per second)
      total time =    5565.68 ms /  1817 tokens
slot      release: id  0 | task 3327 | stop processing: n_tokens = 43111, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 3453 | processing task, is_child = 0
slot update_slots: id  0 | task 3453 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 43278
slot update_slots: id  0 | task 3453 | n_past = 43013, slot.prompt.tokens.size() = 43111, seq_id = 0, pos_min = 43110, n_swa = 1
slot update_slots: id  0 | task 3453 | restored context checkpoint (pos_min = 42475, pos_max = 42475, n_tokens = 42476, size = 62.813 MiB)
slot update_slots: id  0 | task 3453 | n_tokens = 42476, memory_seq_rm [42476, end)
slot update_slots: id  0 | task 3453 | prompt processing progress, n_tokens = 42766, batch.n_tokens = 290, progress = 0.988169
slot update_slots: id  0 | task 3453 | n_tokens = 42766, memory_seq_rm [42766, end)
slot init_sampler: id  0 | task 3453 | init sampler, took 5.19 ms, tokens: text = 43278, total = 43278
slot update_slots: id  0 | task 3453 | erasing old context checkpoint (pos_min = 33704, pos_max = 33704, n_tokens = 33705, size = 62.813 MiB)
slot update_slots: id  0 | task 3453 | created context checkpoint 8 of 8 (pos_min = 42765, pos_max = 42765, n_tokens = 42766, size = 62.813 MiB)
slot update_slots: id  0 | task 3453 | prompt processing done, n_tokens = 43278, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 3453 |
prompt eval time =    1348.00 ms /   802 tokens (    1.68 ms per token,   594.96 tokens per second)
       eval time =   17470.58 ms /   576 tokens (   30.33 ms per token,    32.97 tokens per second)
      total time =   18818.58 ms /  1378 tokens
slot      release: id  0 | task 3453 | stop processing: n_tokens = 43853, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.979 (> 0.100 thold), f_keep = 0.987
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4031 | processing task, is_child = 0
slot update_slots: id  0 | task 4031 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 44208
slot update_slots: id  0 | task 4031 | n_past = 43289, slot.prompt.tokens.size() = 43853, seq_id = 0, pos_min = 43852, n_swa = 1
slot update_slots: id  0 | task 4031 | restored context checkpoint (pos_min = 42765, pos_max = 42765, n_tokens = 42766, size = 62.813 MiB)
slot update_slots: id  0 | task 4031 | n_tokens = 42766, memory_seq_rm [42766, end)
slot update_slots: id  0 | task 4031 | prompt processing progress, n_tokens = 43696, batch.n_tokens = 930, progress = 0.988418
slot update_slots: id  0 | task 4031 | n_tokens = 43696, memory_seq_rm [43696, end)
slot init_sampler: id  0 | task 4031 | init sampler, took 5.31 ms, tokens: text = 44208, total = 44208
slot update_slots: id  0 | task 4031 | erasing old context checkpoint (pos_min = 35625, pos_max = 35625, n_tokens = 35626, size = 62.813 MiB)
slot update_slots: id  0 | task 4031 | created context checkpoint 8 of 8 (pos_min = 43695, pos_max = 43695, n_tokens = 43696, size = 62.813 MiB)
slot update_slots: id  0 | task 4031 | prompt processing done, n_tokens = 44208, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4031 |
prompt eval time =    1702.97 ms /  1442 tokens (    1.18 ms per token,   846.76 tokens per second)
       eval time =    5446.43 ms /   183 tokens (   29.76 ms per token,    33.60 tokens per second)
      total time =    7149.40 ms /  1625 tokens
slot      release: id  0 | task 4031 | stop processing: n_tokens = 44390, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 0.996
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4216 | processing task, is_child = 0
slot update_slots: id  0 | task 4216 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 44439
slot update_slots: id  0 | task 4216 | n_past = 44231, slot.prompt.tokens.size() = 44390, seq_id = 0, pos_min = 44389, n_swa = 1
slot update_slots: id  0 | task 4216 | restored context checkpoint (pos_min = 43695, pos_max = 43695, n_tokens = 43696, size = 62.813 MiB)
slot update_slots: id  0 | task 4216 | n_tokens = 43696, memory_seq_rm [43696, end)
slot update_slots: id  0 | task 4216 | prompt processing progress, n_tokens = 43927, batch.n_tokens = 231, progress = 0.988479
slot update_slots: id  0 | task 4216 | n_tokens = 43927, memory_seq_rm [43927, end)
slot init_sampler: id  0 | task 4216 | init sampler, took 5.51 ms, tokens: text = 44439, total = 44439
slot update_slots: id  0 | task 4216 | erasing old context checkpoint (pos_min = 36331, pos_max = 36331, n_tokens = 36332, size = 62.813 MiB)
slot update_slots: id  0 | task 4216 | created context checkpoint 8 of 8 (pos_min = 43926, pos_max = 43926, n_tokens = 43927, size = 62.813 MiB)
slot update_slots: id  0 | task 4216 | prompt processing done, n_tokens = 44439, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4216 |
prompt eval time =    1313.63 ms /   743 tokens (    1.77 ms per token,   565.61 tokens per second)
       eval time =    3646.65 ms /   119 tokens (   30.64 ms per token,    32.63 tokens per second)
      total time =    4960.28 ms /   862 tokens
slot      release: id  0 | task 4216 | stop processing: n_tokens = 44557, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.991 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4337 | processing task, is_child = 0
slot update_slots: id  0 | task 4337 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 44864
slot update_slots: id  0 | task 4337 | n_past = 44450, slot.prompt.tokens.size() = 44557, seq_id = 0, pos_min = 44556, n_swa = 1
slot update_slots: id  0 | task 4337 | restored context checkpoint (pos_min = 43926, pos_max = 43926, n_tokens = 43927, size = 62.813 MiB)
slot update_slots: id  0 | task 4337 | n_tokens = 43927, memory_seq_rm [43927, end)
slot update_slots: id  0 | task 4337 | prompt processing progress, n_tokens = 44352, batch.n_tokens = 425, progress = 0.988588
slot update_slots: id  0 | task 4337 | n_tokens = 44352, memory_seq_rm [44352, end)
slot init_sampler: id  0 | task 4337 | init sampler, took 5.33 ms, tokens: text = 44864, total = 44864
slot update_slots: id  0 | task 4337 | erasing old context checkpoint (pos_min = 36671, pos_max = 36671, n_tokens = 36672, size = 62.813 MiB)
slot update_slots: id  0 | task 4337 | created context checkpoint 8 of 8 (pos_min = 44351, pos_max = 44351, n_tokens = 44352, size = 62.813 MiB)
slot update_slots: id  0 | task 4337 | prompt processing done, n_tokens = 44864, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4337 |
prompt eval time =    1407.23 ms /   937 tokens (    1.50 ms per token,   665.85 tokens per second)
       eval time =    1069.92 ms /    35 tokens (   30.57 ms per token,    32.71 tokens per second)
      total time =    2477.15 ms /   972 tokens
slot      release: id  0 | task 4337 | stop processing: n_tokens = 44898, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.963 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4374 | processing task, is_child = 0
slot update_slots: id  0 | task 4374 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46647
slot update_slots: id  0 | task 4374 | n_tokens = 44898, memory_seq_rm [44898, end)
slot update_slots: id  0 | task 4374 | prompt processing progress, n_tokens = 46135, batch.n_tokens = 1237, progress = 0.989024
slot update_slots: id  0 | task 4374 | n_tokens = 46135, memory_seq_rm [46135, end)
slot init_sampler: id  0 | task 4374 | init sampler, took 7.33 ms, tokens: text = 46647, total = 46647
slot update_slots: id  0 | task 4374 | erasing old context checkpoint (pos_min = 37699, pos_max = 37699, n_tokens = 37700, size = 62.813 MiB)
slot update_slots: id  0 | task 4374 | created context checkpoint 8 of 8 (pos_min = 46134, pos_max = 46134, n_tokens = 46135, size = 62.813 MiB)
slot update_slots: id  0 | task 4374 | prompt processing done, n_tokens = 46647, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4374 |
prompt eval time =    1863.92 ms /  1749 tokens (    1.07 ms per token,   938.34 tokens per second)
       eval time =    3754.11 ms /   125 tokens (   30.03 ms per token,    33.30 tokens per second)
      total time =    5618.03 ms /  1874 tokens
slot      release: id  0 | task 4374 | stop processing: n_tokens = 46771, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4501 | processing task, is_child = 0
slot update_slots: id  0 | task 4501 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46805
slot update_slots: id  0 | task 4501 | n_past = 46686, slot.prompt.tokens.size() = 46771, seq_id = 0, pos_min = 46770, n_swa = 1
slot update_slots: id  0 | task 4501 | restored context checkpoint (pos_min = 46134, pos_max = 46134, n_tokens = 46135, size = 62.813 MiB)
slot update_slots: id  0 | task 4501 | n_tokens = 46135, memory_seq_rm [46135, end)
slot update_slots: id  0 | task 4501 | prompt processing progress, n_tokens = 46293, batch.n_tokens = 158, progress = 0.989061
slot update_slots: id  0 | task 4501 | n_tokens = 46293, memory_seq_rm [46293, end)
slot init_sampler: id  0 | task 4501 | init sampler, took 5.65 ms, tokens: text = 46805, total = 46805
slot update_slots: id  0 | task 4501 | erasing old context checkpoint (pos_min = 39092, pos_max = 39092, n_tokens = 39093, size = 62.813 MiB)
slot update_slots: id  0 | task 4501 | created context checkpoint 8 of 8 (pos_min = 46292, pos_max = 46292, n_tokens = 46293, size = 62.813 MiB)
slot update_slots: id  0 | task 4501 | prompt processing done, n_tokens = 46805, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4501 |
prompt eval time =    1183.34 ms /   670 tokens (    1.77 ms per token,   566.19 tokens per second)
       eval time =    3352.23 ms /   111 tokens (   30.20 ms per token,    33.11 tokens per second)
      total time =    4535.57 ms /   781 tokens
slot      release: id  0 | task 4501 | stop processing: n_tokens = 46915, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4614 | processing task, is_child = 0
slot update_slots: id  0 | task 4614 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46963
slot update_slots: id  0 | task 4614 | n_past = 46815, slot.prompt.tokens.size() = 46915, seq_id = 0, pos_min = 46914, n_swa = 1
slot update_slots: id  0 | task 4614 | restored context checkpoint (pos_min = 46292, pos_max = 46292, n_tokens = 46293, size = 62.813 MiB)
slot update_slots: id  0 | task 4614 | n_tokens = 46293, memory_seq_rm [46293, end)
slot update_slots: id  0 | task 4614 | prompt processing progress, n_tokens = 46451, batch.n_tokens = 158, progress = 0.989098
slot update_slots: id  0 | task 4614 | n_tokens = 46451, memory_seq_rm [46451, end)
slot init_sampler: id  0 | task 4614 | init sampler, took 6.04 ms, tokens: text = 46963, total = 46963
slot update_slots: id  0 | task 4614 | erasing old context checkpoint (pos_min = 41294, pos_max = 41294, n_tokens = 41295, size = 62.813 MiB)
slot update_slots: id  0 | task 4614 | created context checkpoint 8 of 8 (pos_min = 46450, pos_max = 46450, n_tokens = 46451, size = 62.813 MiB)
slot update_slots: id  0 | task 4614 | prompt processing done, n_tokens = 46963, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4614 |
prompt eval time =    1180.28 ms /   670 tokens (    1.76 ms per token,   567.66 tokens per second)
       eval time =    3070.20 ms /    99 tokens (   31.01 ms per token,    32.25 tokens per second)
      total time =    4250.49 ms /   769 tokens
slot      release: id  0 | task 4614 | stop processing: n_tokens = 47061, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4715 | processing task, is_child = 0
slot update_slots: id  0 | task 4715 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47109
slot update_slots: id  0 | task 4715 | n_past = 46973, slot.prompt.tokens.size() = 47061, seq_id = 0, pos_min = 47060, n_swa = 1
slot update_slots: id  0 | task 4715 | restored context checkpoint (pos_min = 46450, pos_max = 46450, n_tokens = 46451, size = 62.813 MiB)
slot update_slots: id  0 | task 4715 | n_tokens = 46451, memory_seq_rm [46451, end)
slot update_slots: id  0 | task 4715 | prompt processing progress, n_tokens = 46597, batch.n_tokens = 146, progress = 0.989132
slot update_slots: id  0 | task 4715 | n_tokens = 46597, memory_seq_rm [46597, end)
slot init_sampler: id  0 | task 4715 | init sampler, took 6.01 ms, tokens: text = 47109, total = 47109
slot update_slots: id  0 | task 4715 | erasing old context checkpoint (pos_min = 42475, pos_max = 42475, n_tokens = 42476, size = 62.813 MiB)
slot update_slots: id  0 | task 4715 | created context checkpoint 8 of 8 (pos_min = 46596, pos_max = 46596, n_tokens = 46597, size = 62.813 MiB)
slot update_slots: id  0 | task 4715 | prompt processing done, n_tokens = 47109, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4715 |
prompt eval time =    1221.52 ms /   658 tokens (    1.86 ms per token,   538.67 tokens per second)
       eval time =    2014.24 ms /    65 tokens (   30.99 ms per token,    32.27 tokens per second)
      total time =    3235.76 ms /   723 tokens
slot      release: id  0 | task 4715 | stop processing: n_tokens = 47173, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4782 | processing task, is_child = 0
slot update_slots: id  0 | task 4782 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47204
slot update_slots: id  0 | task 4782 | n_past = 47132, slot.prompt.tokens.size() = 47173, seq_id = 0, pos_min = 47172, n_swa = 1
slot update_slots: id  0 | task 4782 | restored context checkpoint (pos_min = 46596, pos_max = 46596, n_tokens = 46597, size = 62.813 MiB)
slot update_slots: id  0 | task 4782 | n_tokens = 46597, memory_seq_rm [46597, end)
slot update_slots: id  0 | task 4782 | prompt processing progress, n_tokens = 46692, batch.n_tokens = 95, progress = 0.989153
slot update_slots: id  0 | task 4782 | n_tokens = 46692, memory_seq_rm [46692, end)
slot init_sampler: id  0 | task 4782 | init sampler, took 5.79 ms, tokens: text = 47204, total = 47204
slot update_slots: id  0 | task 4782 | erasing old context checkpoint (pos_min = 42765, pos_max = 42765, n_tokens = 42766, size = 62.813 MiB)
slot update_slots: id  0 | task 4782 | created context checkpoint 8 of 8 (pos_min = 46691, pos_max = 46691, n_tokens = 46692, size = 62.813 MiB)
slot update_slots: id  0 | task 4782 | prompt processing done, n_tokens = 47204, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4782 |
prompt eval time =    1131.61 ms /   607 tokens (    1.86 ms per token,   536.40 tokens per second)
       eval time =    1859.83 ms /    62 tokens (   30.00 ms per token,    33.34 tokens per second)
      total time =    2991.44 ms /   669 tokens
slot      release: id  0 | task 4782 | stop processing: n_tokens = 47265, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.991 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4846 | processing task, is_child = 0
slot update_slots: id  0 | task 4846 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47676
slot update_slots: id  0 | task 4846 | n_tokens = 47265, memory_seq_rm [47265, end)
slot init_sampler: id  0 | task 4846 | init sampler, took 5.80 ms, tokens: text = 47676, total = 47676
slot update_slots: id  0 | task 4846 | erasing old context checkpoint (pos_min = 43695, pos_max = 43695, n_tokens = 43696, size = 62.813 MiB)
slot update_slots: id  0 | task 4846 | created context checkpoint 8 of 8 (pos_min = 47264, pos_max = 47264, n_tokens = 47265, size = 62.813 MiB)
slot update_slots: id  0 | task 4846 | prompt processing done, n_tokens = 47676, batch.n_tokens = 411
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 4846 |
prompt eval time =     720.74 ms /   411 tokens (    1.75 ms per token,   570.25 tokens per second)
       eval time =    4857.59 ms /   160 tokens (   30.36 ms per token,    32.94 tokens per second)
      total time =    5578.33 ms /   571 tokens
slot      release: id  0 | task 4846 | stop processing: n_tokens = 47835, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5007 | processing task, is_child = 0
slot update_slots: id  0 | task 5007 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47991
slot update_slots: id  0 | task 5007 | n_past = 47699, slot.prompt.tokens.size() = 47835, seq_id = 0, pos_min = 47834, n_swa = 1
slot update_slots: id  0 | task 5007 | restored context checkpoint (pos_min = 47264, pos_max = 47264, n_tokens = 47265, size = 62.813 MiB)
slot update_slots: id  0 | task 5007 | n_tokens = 47265, memory_seq_rm [47265, end)
slot update_slots: id  0 | task 5007 | prompt processing progress, n_tokens = 47479, batch.n_tokens = 214, progress = 0.989331
slot update_slots: id  0 | task 5007 | n_tokens = 47479, memory_seq_rm [47479, end)
slot init_sampler: id  0 | task 5007 | init sampler, took 5.90 ms, tokens: text = 47991, total = 47991
slot update_slots: id  0 | task 5007 | erasing old context checkpoint (pos_min = 43926, pos_max = 43926, n_tokens = 43927, size = 62.813 MiB)
slot update_slots: id  0 | task 5007 | created context checkpoint 8 of 8 (pos_min = 47478, pos_max = 47478, n_tokens = 47479, size = 62.813 MiB)
slot update_slots: id  0 | task 5007 | prompt processing done, n_tokens = 47991, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5007 |
prompt eval time =    1251.99 ms /   726 tokens (    1.72 ms per token,   579.87 tokens per second)
       eval time =    5086.56 ms /   168 tokens (   30.28 ms per token,    33.03 tokens per second)
      total time =    6338.55 ms /   894 tokens
slot      release: id  0 | task 5007 | stop processing: n_tokens = 48158, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.908 (> 0.100 thold), f_keep = 0.547
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5177 | processing task, is_child = 0
slot update_slots: id  0 | task 5177 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 29006
slot update_slots: id  0 | task 5177 | n_past = 26323, slot.prompt.tokens.size() = 48158, seq_id = 0, pos_min = 48157, n_swa = 1
slot update_slots: id  0 | task 5177 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 5177 | erased invalidated context checkpoint (pos_min = 44351, pos_max = 44351, n_tokens = 44352, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | erased invalidated context checkpoint (pos_min = 46134, pos_max = 46134, n_tokens = 46135, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | erased invalidated context checkpoint (pos_min = 46292, pos_max = 46292, n_tokens = 46293, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | erased invalidated context checkpoint (pos_min = 46450, pos_max = 46450, n_tokens = 46451, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | erased invalidated context checkpoint (pos_min = 46596, pos_max = 46596, n_tokens = 46597, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | erased invalidated context checkpoint (pos_min = 46691, pos_max = 46691, n_tokens = 46692, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | erased invalidated context checkpoint (pos_min = 47264, pos_max = 47264, n_tokens = 47265, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | erased invalidated context checkpoint (pos_min = 47478, pos_max = 47478, n_tokens = 47479, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 5177 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 4096, progress = 0.141212
slot update_slots: id  0 | task 5177 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id  0 | task 5177 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 4096, progress = 0.282424
slot update_slots: id  0 | task 5177 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id  0 | task 5177 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 4096, progress = 0.423636
slot update_slots: id  0 | task 5177 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id  0 | task 5177 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 4096, progress = 0.564849
slot update_slots: id  0 | task 5177 | n_tokens = 16384, memory_seq_rm [16384, end)
slot update_slots: id  0 | task 5177 | prompt processing progress, n_tokens = 20480, batch.n_tokens = 4096, progress = 0.706061
slot update_slots: id  0 | task 5177 | n_tokens = 20480, memory_seq_rm [20480, end)
slot update_slots: id  0 | task 5177 | prompt processing progress, n_tokens = 24576, batch.n_tokens = 4096, progress = 0.847273
slot update_slots: id  0 | task 5177 | n_tokens = 24576, memory_seq_rm [24576, end)
slot update_slots: id  0 | task 5177 | prompt processing progress, n_tokens = 28494, batch.n_tokens = 3918, progress = 0.982349
slot update_slots: id  0 | task 5177 | n_tokens = 28494, memory_seq_rm [28494, end)
slot init_sampler: id  0 | task 5177 | init sampler, took 3.50 ms, tokens: text = 29006, total = 29006
slot update_slots: id  0 | task 5177 | created context checkpoint 1 of 8 (pos_min = 28493, pos_max = 28493, n_tokens = 28494, size = 62.813 MiB)
slot update_slots: id  0 | task 5177 | prompt processing done, n_tokens = 29006, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5177 |
prompt eval time =   19228.53 ms / 29006 tokens (    0.66 ms per token,  1508.49 tokens per second)
       eval time =     898.79 ms /    32 tokens (   28.09 ms per token,    35.60 tokens per second)
      total time =   20127.33 ms / 29038 tokens
slot      release: id  0 | task 5177 | stop processing: n_tokens = 29037, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.962 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5217 | processing task, is_child = 0
slot update_slots: id  0 | task 5217 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 30151
slot update_slots: id  0 | task 5217 | n_past = 29002, slot.prompt.tokens.size() = 29037, seq_id = 0, pos_min = 29036, n_swa = 1
slot update_slots: id  0 | task 5217 | restored context checkpoint (pos_min = 28493, pos_max = 28493, n_tokens = 28494, size = 62.813 MiB)
slot update_slots: id  0 | task 5217 | n_tokens = 28494, memory_seq_rm [28494, end)
slot update_slots: id  0 | task 5217 | prompt processing progress, n_tokens = 29639, batch.n_tokens = 1145, progress = 0.983019
slot update_slots: id  0 | task 5217 | n_tokens = 29639, memory_seq_rm [29639, end)
slot init_sampler: id  0 | task 5217 | init sampler, took 3.71 ms, tokens: text = 30151, total = 30151
slot update_slots: id  0 | task 5217 | created context checkpoint 2 of 8 (pos_min = 29638, pos_max = 29638, n_tokens = 29639, size = 62.813 MiB)
slot update_slots: id  0 | task 5217 | prompt processing done, n_tokens = 30151, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5217 |
prompt eval time =    1615.85 ms /  1657 tokens (    0.98 ms per token,  1025.46 tokens per second)
       eval time =     857.71 ms /    31 tokens (   27.67 ms per token,    36.14 tokens per second)
      total time =    2473.56 ms /  1688 tokens
slot      release: id  0 | task 5217 | stop processing: n_tokens = 30181, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.964 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5250 | processing task, is_child = 0
slot update_slots: id  0 | task 5250 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 31263
slot update_slots: id  0 | task 5250 | n_past = 30147, slot.prompt.tokens.size() = 30181, seq_id = 0, pos_min = 30180, n_swa = 1
slot update_slots: id  0 | task 5250 | restored context checkpoint (pos_min = 29638, pos_max = 29638, n_tokens = 29639, size = 62.813 MiB)
slot update_slots: id  0 | task 5250 | n_tokens = 29639, memory_seq_rm [29639, end)
slot update_slots: id  0 | task 5250 | prompt processing progress, n_tokens = 30751, batch.n_tokens = 1112, progress = 0.983623
slot update_slots: id  0 | task 5250 | n_tokens = 30751, memory_seq_rm [30751, end)
slot init_sampler: id  0 | task 5250 | init sampler, took 3.92 ms, tokens: text = 31263, total = 31263
slot update_slots: id  0 | task 5250 | created context checkpoint 3 of 8 (pos_min = 30750, pos_max = 30750, n_tokens = 30751, size = 62.813 MiB)
slot update_slots: id  0 | task 5250 | prompt processing done, n_tokens = 31263, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5250 |
prompt eval time =    1669.70 ms /  1624 tokens (    1.03 ms per token,   972.63 tokens per second)
       eval time =     914.45 ms /    31 tokens (   29.50 ms per token,    33.90 tokens per second)
      total time =    2584.15 ms /  1655 tokens
slot      release: id  0 | task 5250 | stop processing: n_tokens = 31293, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5283 | processing task, is_child = 0
slot update_slots: id  0 | task 5283 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 31313
slot update_slots: id  0 | task 5283 | n_past = 31292, slot.prompt.tokens.size() = 31293, seq_id = 0, pos_min = 31292, n_swa = 1
slot update_slots: id  0 | task 5283 | restored context checkpoint (pos_min = 30750, pos_max = 30750, n_tokens = 30751, size = 62.813 MiB)
slot update_slots: id  0 | task 5283 | n_tokens = 30751, memory_seq_rm [30751, end)
slot update_slots: id  0 | task 5283 | prompt processing progress, n_tokens = 30801, batch.n_tokens = 50, progress = 0.983649
slot update_slots: id  0 | task 5283 | n_tokens = 30801, memory_seq_rm [30801, end)
slot init_sampler: id  0 | task 5283 | init sampler, took 3.84 ms, tokens: text = 31313, total = 31313
slot update_slots: id  0 | task 5283 | prompt processing done, n_tokens = 31313, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5283 |
prompt eval time =     925.94 ms /   562 tokens (    1.65 ms per token,   606.95 tokens per second)
       eval time =     872.50 ms /    29 tokens (   30.09 ms per token,    33.24 tokens per second)
      total time =    1798.44 ms /   591 tokens
slot      release: id  0 | task 5283 | stop processing: n_tokens = 31341, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5314 | processing task, is_child = 0
slot update_slots: id  0 | task 5314 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 31398
slot update_slots: id  0 | task 5314 | n_past = 31340, slot.prompt.tokens.size() = 31341, seq_id = 0, pos_min = 31340, n_swa = 1
slot update_slots: id  0 | task 5314 | restored context checkpoint (pos_min = 30750, pos_max = 30750, n_tokens = 30751, size = 62.813 MiB)
slot update_slots: id  0 | task 5314 | n_tokens = 30751, memory_seq_rm [30751, end)
slot update_slots: id  0 | task 5314 | prompt processing progress, n_tokens = 30886, batch.n_tokens = 135, progress = 0.983693
slot update_slots: id  0 | task 5314 | n_tokens = 30886, memory_seq_rm [30886, end)
slot init_sampler: id  0 | task 5314 | init sampler, took 3.96 ms, tokens: text = 31398, total = 31398
slot update_slots: id  0 | task 5314 | created context checkpoint 4 of 8 (pos_min = 30885, pos_max = 30885, n_tokens = 30886, size = 62.813 MiB)
slot update_slots: id  0 | task 5314 | prompt processing done, n_tokens = 31398, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5314 |
prompt eval time =    1096.73 ms /   647 tokens (    1.70 ms per token,   589.93 tokens per second)
       eval time =     969.81 ms /    34 tokens (   28.52 ms per token,    35.06 tokens per second)
      total time =    2066.54 ms /   681 tokens
slot      release: id  0 | task 5314 | stop processing: n_tokens = 31431, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.986 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5350 | processing task, is_child = 0
slot update_slots: id  0 | task 5350 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 31872
slot update_slots: id  0 | task 5350 | n_tokens = 31431, memory_seq_rm [31431, end)
slot init_sampler: id  0 | task 5350 | init sampler, took 3.97 ms, tokens: text = 31872, total = 31872
slot update_slots: id  0 | task 5350 | created context checkpoint 5 of 8 (pos_min = 31430, pos_max = 31430, n_tokens = 31431, size = 62.813 MiB)
slot update_slots: id  0 | task 5350 | prompt processing done, n_tokens = 31872, batch.n_tokens = 441
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5350 |
prompt eval time =     636.80 ms /   441 tokens (    1.44 ms per token,   692.53 tokens per second)
       eval time =    1028.71 ms /    35 tokens (   29.39 ms per token,    34.02 tokens per second)
      total time =    1665.51 ms /   476 tokens
slot      release: id  0 | task 5350 | stop processing: n_tokens = 31906, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.948 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5386 | processing task, is_child = 0
slot update_slots: id  0 | task 5386 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 33655
slot update_slots: id  0 | task 5386 | n_tokens = 31906, memory_seq_rm [31906, end)
slot update_slots: id  0 | task 5386 | prompt processing progress, n_tokens = 33143, batch.n_tokens = 1237, progress = 0.984787
slot update_slots: id  0 | task 5386 | n_tokens = 33143, memory_seq_rm [33143, end)
slot init_sampler: id  0 | task 5386 | init sampler, took 4.10 ms, tokens: text = 33655, total = 33655
slot update_slots: id  0 | task 5386 | created context checkpoint 6 of 8 (pos_min = 33142, pos_max = 33142, n_tokens = 33143, size = 62.813 MiB)
slot update_slots: id  0 | task 5386 | prompt processing done, n_tokens = 33655, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5386 |
prompt eval time =    1773.15 ms /  1749 tokens (    1.01 ms per token,   986.38 tokens per second)
       eval time =     991.98 ms /    34 tokens (   29.18 ms per token,    34.28 tokens per second)
      total time =    2765.13 ms /  1783 tokens
slot      release: id  0 | task 5386 | stop processing: n_tokens = 33688, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.958 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5422 | processing task, is_child = 0
slot update_slots: id  0 | task 5422 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 35175
slot update_slots: id  0 | task 5422 | n_tokens = 33688, memory_seq_rm [33688, end)
slot update_slots: id  0 | task 5422 | prompt processing progress, n_tokens = 34663, batch.n_tokens = 975, progress = 0.985444
slot update_slots: id  0 | task 5422 | n_tokens = 34663, memory_seq_rm [34663, end)
slot init_sampler: id  0 | task 5422 | init sampler, took 4.49 ms, tokens: text = 35175, total = 35175
slot update_slots: id  0 | task 5422 | created context checkpoint 7 of 8 (pos_min = 34662, pos_max = 34662, n_tokens = 34663, size = 62.813 MiB)
slot update_slots: id  0 | task 5422 | prompt processing done, n_tokens = 35175, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5422 |
prompt eval time =    1650.43 ms /  1487 tokens (    1.11 ms per token,   900.98 tokens per second)
       eval time =    1035.42 ms /    34 tokens (   30.45 ms per token,    32.84 tokens per second)
      total time =    2685.85 ms /  1521 tokens
slot      release: id  0 | task 5422 | stop processing: n_tokens = 35208, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.948 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5458 | processing task, is_child = 0
slot update_slots: id  0 | task 5458 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 37122
slot update_slots: id  0 | task 5458 | n_tokens = 35208, memory_seq_rm [35208, end)
slot update_slots: id  0 | task 5458 | prompt processing progress, n_tokens = 36610, batch.n_tokens = 1402, progress = 0.986208
slot update_slots: id  0 | task 5458 | n_tokens = 36610, memory_seq_rm [36610, end)
slot init_sampler: id  0 | task 5458 | init sampler, took 4.84 ms, tokens: text = 37122, total = 37122
slot update_slots: id  0 | task 5458 | created context checkpoint 8 of 8 (pos_min = 36609, pos_max = 36609, n_tokens = 36610, size = 62.813 MiB)
slot update_slots: id  0 | task 5458 | prompt processing done, n_tokens = 37122, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5458 |
prompt eval time =    1863.45 ms /  1914 tokens (    0.97 ms per token,  1027.13 tokens per second)
       eval time =    6356.80 ms /   211 tokens (   30.13 ms per token,    33.19 tokens per second)
      total time =    8220.25 ms /  2125 tokens
slot      release: id  0 | task 5458 | stop processing: n_tokens = 37332, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5671 | processing task, is_child = 0
slot update_slots: id  0 | task 5671 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 37522
slot update_slots: id  0 | task 5671 | n_tokens = 37332, memory_seq_rm [37332, end)
slot init_sampler: id  0 | task 5671 | init sampler, took 4.70 ms, tokens: text = 37522, total = 37522
slot update_slots: id  0 | task 5671 | erasing old context checkpoint (pos_min = 28493, pos_max = 28493, n_tokens = 28494, size = 62.813 MiB)
slot update_slots: id  0 | task 5671 | created context checkpoint 8 of 8 (pos_min = 37331, pos_max = 37331, n_tokens = 37332, size = 62.813 MiB)
slot update_slots: id  0 | task 5671 | prompt processing done, n_tokens = 37522, batch.n_tokens = 190
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5671 |
prompt eval time =     583.70 ms /   190 tokens (    3.07 ms per token,   325.51 tokens per second)
       eval time =    6196.76 ms /   204 tokens (   30.38 ms per token,    32.92 tokens per second)
      total time =    6780.47 ms /   394 tokens
slot      release: id  0 | task 5671 | stop processing: n_tokens = 37725, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 5876 | processing task, is_child = 0
slot update_slots: id  0 | task 5876 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 37816
slot update_slots: id  0 | task 5876 | n_tokens = 37725, memory_seq_rm [37725, end)
slot init_sampler: id  0 | task 5876 | init sampler, took 4.73 ms, tokens: text = 37816, total = 37816
slot update_slots: id  0 | task 5876 | erasing old context checkpoint (pos_min = 29638, pos_max = 29638, n_tokens = 29639, size = 62.813 MiB)
slot update_slots: id  0 | task 5876 | created context checkpoint 8 of 8 (pos_min = 37724, pos_max = 37724, n_tokens = 37725, size = 62.813 MiB)
slot update_slots: id  0 | task 5876 | prompt processing done, n_tokens = 37816, batch.n_tokens = 91
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 5876 |
prompt eval time =     418.51 ms /    91 tokens (    4.60 ms per token,   217.44 tokens per second)
       eval time =    5221.88 ms /   171 tokens (   30.54 ms per token,    32.75 tokens per second)
      total time =    5640.39 ms /   262 tokens
slot      release: id  0 | task 5876 | stop processing: n_tokens = 37986, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 6048 | processing task, is_child = 0
slot update_slots: id  0 | task 6048 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38062
slot update_slots: id  0 | task 6048 | n_tokens = 37986, memory_seq_rm [37986, end)
slot init_sampler: id  0 | task 6048 | init sampler, took 4.86 ms, tokens: text = 38062, total = 38062
slot update_slots: id  0 | task 6048 | erasing old context checkpoint (pos_min = 30750, pos_max = 30750, n_tokens = 30751, size = 62.813 MiB)
slot update_slots: id  0 | task 6048 | created context checkpoint 8 of 8 (pos_min = 37985, pos_max = 37985, n_tokens = 37986, size = 62.813 MiB)
slot update_slots: id  0 | task 6048 | prompt processing done, n_tokens = 38062, batch.n_tokens = 76
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 6048 |
prompt eval time =     412.61 ms /    76 tokens (    5.43 ms per token,   184.19 tokens per second)
       eval time =   12535.38 ms /   412 tokens (   30.43 ms per token,    32.87 tokens per second)
      total time =   12947.99 ms /   488 tokens
slot      release: id  0 | task 6048 | stop processing: n_tokens = 38473, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.813 (> 0.100 thold), f_keep = 0.812
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 6461 | processing task, is_child = 0
slot update_slots: id  0 | task 6461 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38450
slot update_slots: id  0 | task 6461 | n_past = 31259, slot.prompt.tokens.size() = 38473, seq_id = 0, pos_min = 38472, n_swa = 1
slot update_slots: id  0 | task 6461 | restored context checkpoint (pos_min = 30885, pos_max = 30885, n_tokens = 30886, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | erased invalidated context checkpoint (pos_min = 31430, pos_max = 31430, n_tokens = 31431, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | erased invalidated context checkpoint (pos_min = 33142, pos_max = 33142, n_tokens = 33143, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | erased invalidated context checkpoint (pos_min = 34662, pos_max = 34662, n_tokens = 34663, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | erased invalidated context checkpoint (pos_min = 36609, pos_max = 36609, n_tokens = 36610, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | erased invalidated context checkpoint (pos_min = 37331, pos_max = 37331, n_tokens = 37332, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | erased invalidated context checkpoint (pos_min = 37724, pos_max = 37724, n_tokens = 37725, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | erased invalidated context checkpoint (pos_min = 37985, pos_max = 37985, n_tokens = 37986, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | n_tokens = 30886, memory_seq_rm [30886, end)
slot update_slots: id  0 | task 6461 | prompt processing progress, n_tokens = 34982, batch.n_tokens = 4096, progress = 0.909805
slot update_slots: id  0 | task 6461 | n_tokens = 34982, memory_seq_rm [34982, end)
slot update_slots: id  0 | task 6461 | prompt processing progress, n_tokens = 37938, batch.n_tokens = 2956, progress = 0.986684
slot update_slots: id  0 | task 6461 | n_tokens = 37938, memory_seq_rm [37938, end)
slot init_sampler: id  0 | task 6461 | init sampler, took 5.53 ms, tokens: text = 38450, total = 38450
slot update_slots: id  0 | task 6461 | created context checkpoint 2 of 8 (pos_min = 37937, pos_max = 37937, n_tokens = 37938, size = 62.813 MiB)
slot update_slots: id  0 | task 6461 | prompt processing done, n_tokens = 38450, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 6461 |
prompt eval time =    6428.73 ms /  7564 tokens (    0.85 ms per token,  1176.59 tokens per second)
       eval time =    2342.51 ms /    79 tokens (   29.65 ms per token,    33.72 tokens per second)
      total time =    8771.24 ms /  7643 tokens
slot      release: id  0 | task 6461 | stop processing: n_tokens = 38528, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 6543 | processing task, is_child = 0
slot update_slots: id  0 | task 6543 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38559
slot update_slots: id  0 | task 6543 | n_tokens = 38528, memory_seq_rm [38528, end)
slot init_sampler: id  0 | task 6543 | init sampler, took 4.67 ms, tokens: text = 38559, total = 38559
slot update_slots: id  0 | task 6543 | created context checkpoint 3 of 8 (pos_min = 38527, pos_max = 38527, n_tokens = 38528, size = 62.813 MiB)
slot update_slots: id  0 | task 6543 | prompt processing done, n_tokens = 38559, batch.n_tokens = 31
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 6543 |
prompt eval time =     257.27 ms /    31 tokens (    8.30 ms per token,   120.50 tokens per second)
       eval time =    1591.97 ms /    54 tokens (   29.48 ms per token,    33.92 tokens per second)
      total time =    1849.24 ms /    85 tokens
slot      release: id  0 | task 6543 | stop processing: n_tokens = 38612, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 6598 | processing task, is_child = 0
slot update_slots: id  0 | task 6598 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38629
slot update_slots: id  0 | task 6598 | n_tokens = 38612, memory_seq_rm [38612, end)
slot init_sampler: id  0 | task 6598 | init sampler, took 4.68 ms, tokens: text = 38629, total = 38629
slot update_slots: id  0 | task 6598 | created context checkpoint 4 of 8 (pos_min = 38611, pos_max = 38611, n_tokens = 38612, size = 62.813 MiB)
slot update_slots: id  0 | task 6598 | prompt processing done, n_tokens = 38629, batch.n_tokens = 17
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 6598 |
prompt eval time =     239.91 ms /    17 tokens (   14.11 ms per token,    70.86 tokens per second)
       eval time =   19755.90 ms /   667 tokens (   29.62 ms per token,    33.76 tokens per second)
      total time =   19995.81 ms /   684 tokens
slot      release: id  0 | task 6598 | stop processing: n_tokens = 39295, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 7266 | processing task, is_child = 0
slot update_slots: id  0 | task 7266 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 39344
slot update_slots: id  0 | task 7266 | n_past = 39285, slot.prompt.tokens.size() = 39295, seq_id = 0, pos_min = 39294, n_swa = 1
slot update_slots: id  0 | task 7266 | restored context checkpoint (pos_min = 38611, pos_max = 38611, n_tokens = 38612, size = 62.813 MiB)
slot update_slots: id  0 | task 7266 | n_tokens = 38612, memory_seq_rm [38612, end)
slot update_slots: id  0 | task 7266 | prompt processing progress, n_tokens = 38832, batch.n_tokens = 220, progress = 0.986987
slot update_slots: id  0 | task 7266 | n_tokens = 38832, memory_seq_rm [38832, end)
slot init_sampler: id  0 | task 7266 | init sampler, took 4.92 ms, tokens: text = 39344, total = 39344
slot update_slots: id  0 | task 7266 | created context checkpoint 5 of 8 (pos_min = 38831, pos_max = 38831, n_tokens = 38832, size = 62.813 MiB)
slot update_slots: id  0 | task 7266 | prompt processing done, n_tokens = 39344, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 7266 |
prompt eval time =    1210.34 ms /   732 tokens (    1.65 ms per token,   604.79 tokens per second)
       eval time =    2623.48 ms /    85 tokens (   30.86 ms per token,    32.40 tokens per second)
      total time =    3833.82 ms /   817 tokens
slot      release: id  0 | task 7266 | stop processing: n_tokens = 39428, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 7353 | processing task, is_child = 0
slot update_slots: id  0 | task 7353 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 39495
slot update_slots: id  0 | task 7353 | n_past = 39358, slot.prompt.tokens.size() = 39428, seq_id = 0, pos_min = 39427, n_swa = 1
slot update_slots: id  0 | task 7353 | restored context checkpoint (pos_min = 38831, pos_max = 38831, n_tokens = 38832, size = 62.813 MiB)
slot update_slots: id  0 | task 7353 | n_tokens = 38832, memory_seq_rm [38832, end)
slot update_slots: id  0 | task 7353 | prompt processing progress, n_tokens = 38983, batch.n_tokens = 151, progress = 0.987036
slot update_slots: id  0 | task 7353 | n_tokens = 38983, memory_seq_rm [38983, end)
slot init_sampler: id  0 | task 7353 | init sampler, took 4.80 ms, tokens: text = 39495, total = 39495
slot update_slots: id  0 | task 7353 | created context checkpoint 6 of 8 (pos_min = 38982, pos_max = 38982, n_tokens = 38983, size = 62.813 MiB)
slot update_slots: id  0 | task 7353 | prompt processing done, n_tokens = 39495, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 7353 |
prompt eval time =    1148.45 ms /   663 tokens (    1.73 ms per token,   577.30 tokens per second)
       eval time =    1128.64 ms /    38 tokens (   29.70 ms per token,    33.67 tokens per second)
      total time =    2277.08 ms /   701 tokens
slot      release: id  0 | task 7353 | stop processing: n_tokens = 39532, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 7393 | processing task, is_child = 0
slot update_slots: id  0 | task 7393 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 39555
slot update_slots: id  0 | task 7393 | n_tokens = 39532, memory_seq_rm [39532, end)
slot init_sampler: id  0 | task 7393 | init sampler, took 4.81 ms, tokens: text = 39555, total = 39555
slot update_slots: id  0 | task 7393 | created context checkpoint 7 of 8 (pos_min = 39531, pos_max = 39531, n_tokens = 39532, size = 62.813 MiB)
slot update_slots: id  0 | task 7393 | prompt processing done, n_tokens = 39555, batch.n_tokens = 23
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 7393 |
prompt eval time =     212.09 ms /    23 tokens (    9.22 ms per token,   108.45 tokens per second)
       eval time =     920.92 ms /    31 tokens (   29.71 ms per token,    33.66 tokens per second)
      total time =    1133.01 ms /    54 tokens
slot      release: id  0 | task 7393 | stop processing: n_tokens = 39585, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.951 (> 0.100 thold), f_keep = 0.971
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 7425 | processing task, is_child = 0
slot update_slots: id  0 | task 7425 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 40431
slot update_slots: id  0 | task 7425 | n_past = 38446, slot.prompt.tokens.size() = 39585, seq_id = 0, pos_min = 39584, n_swa = 1
slot update_slots: id  0 | task 7425 | restored context checkpoint (pos_min = 37937, pos_max = 37937, n_tokens = 37938, size = 62.813 MiB)
slot update_slots: id  0 | task 7425 | erased invalidated context checkpoint (pos_min = 38527, pos_max = 38527, n_tokens = 38528, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 7425 | erased invalidated context checkpoint (pos_min = 38611, pos_max = 38611, n_tokens = 38612, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 7425 | erased invalidated context checkpoint (pos_min = 38831, pos_max = 38831, n_tokens = 38832, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 7425 | erased invalidated context checkpoint (pos_min = 38982, pos_max = 38982, n_tokens = 38983, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 7425 | erased invalidated context checkpoint (pos_min = 39531, pos_max = 39531, n_tokens = 39532, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 7425 | n_tokens = 37938, memory_seq_rm [37938, end)
slot update_slots: id  0 | task 7425 | prompt processing progress, n_tokens = 39919, batch.n_tokens = 1981, progress = 0.987336
slot update_slots: id  0 | task 7425 | n_tokens = 39919, memory_seq_rm [39919, end)
slot init_sampler: id  0 | task 7425 | init sampler, took 4.93 ms, tokens: text = 40431, total = 40431
slot update_slots: id  0 | task 7425 | created context checkpoint 3 of 8 (pos_min = 39918, pos_max = 39918, n_tokens = 39919, size = 62.813 MiB)
slot update_slots: id  0 | task 7425 | prompt processing done, n_tokens = 40431, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 7425 |
prompt eval time =    2276.96 ms /  2493 tokens (    0.91 ms per token,  1094.88 tokens per second)
       eval time =   59593.90 ms /  1919 tokens (   31.05 ms per token,    32.20 tokens per second)
      total time =   61870.86 ms /  4412 tokens
slot      release: id  0 | task 7425 | stop processing: n_tokens = 42349, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 9346 | processing task, is_child = 0
slot update_slots: id  0 | task 9346 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42399
slot update_slots: id  0 | task 9346 | n_past = 42339, slot.prompt.tokens.size() = 42349, seq_id = 0, pos_min = 42348, n_swa = 1
slot update_slots: id  0 | task 9346 | restored context checkpoint (pos_min = 39918, pos_max = 39918, n_tokens = 39919, size = 62.813 MiB)
slot update_slots: id  0 | task 9346 | n_tokens = 39919, memory_seq_rm [39919, end)
slot update_slots: id  0 | task 9346 | prompt processing progress, n_tokens = 41887, batch.n_tokens = 1968, progress = 0.987924
slot update_slots: id  0 | task 9346 | n_tokens = 41887, memory_seq_rm [41887, end)
slot init_sampler: id  0 | task 9346 | init sampler, took 5.48 ms, tokens: text = 42399, total = 42399
slot update_slots: id  0 | task 9346 | created context checkpoint 4 of 8 (pos_min = 41886, pos_max = 41886, n_tokens = 41887, size = 62.813 MiB)
slot update_slots: id  0 | task 9346 | prompt processing done, n_tokens = 42399, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 9346 |
prompt eval time =    2339.85 ms /  2480 tokens (    0.94 ms per token,  1059.90 tokens per second)
       eval time =    2393.71 ms /    77 tokens (   31.09 ms per token,    32.17 tokens per second)
      total time =    4733.56 ms /  2557 tokens
slot      release: id  0 | task 9346 | stop processing: n_tokens = 42475, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 9425 | processing task, is_child = 0
slot update_slots: id  0 | task 9425 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42553
slot update_slots: id  0 | task 9425 | n_tokens = 42475, memory_seq_rm [42475, end)
slot init_sampler: id  0 | task 9425 | init sampler, took 5.25 ms, tokens: text = 42553, total = 42553
slot update_slots: id  0 | task 9425 | created context checkpoint 5 of 8 (pos_min = 42474, pos_max = 42474, n_tokens = 42475, size = 62.813 MiB)
slot update_slots: id  0 | task 9425 | prompt processing done, n_tokens = 42553, batch.n_tokens = 78
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 9425 |
prompt eval time =     391.98 ms /    78 tokens (    5.03 ms per token,   198.99 tokens per second)
       eval time =    2859.19 ms /    88 tokens (   32.49 ms per token,    30.78 tokens per second)
      total time =    3251.17 ms /   166 tokens
slot      release: id  0 | task 9425 | stop processing: n_tokens = 42640, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.948 (> 0.100 thold), f_keep = 0.948
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 9514 | processing task, is_child = 0
slot update_slots: id  0 | task 9514 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42643
slot update_slots: id  0 | task 9514 | n_past = 40427, slot.prompt.tokens.size() = 42640, seq_id = 0, pos_min = 42639, n_swa = 1
slot update_slots: id  0 | task 9514 | restored context checkpoint (pos_min = 39918, pos_max = 39918, n_tokens = 39919, size = 62.813 MiB)
slot update_slots: id  0 | task 9514 | erased invalidated context checkpoint (pos_min = 41886, pos_max = 41886, n_tokens = 41887, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 9514 | erased invalidated context checkpoint (pos_min = 42474, pos_max = 42474, n_tokens = 42475, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 9514 | n_tokens = 39919, memory_seq_rm [39919, end)
slot update_slots: id  0 | task 9514 | prompt processing progress, n_tokens = 42131, batch.n_tokens = 2212, progress = 0.987993
slot update_slots: id  0 | task 9514 | n_tokens = 42131, memory_seq_rm [42131, end)
slot init_sampler: id  0 | task 9514 | init sampler, took 5.67 ms, tokens: text = 42643, total = 42643
slot update_slots: id  0 | task 9514 | created context checkpoint 4 of 8 (pos_min = 42130, pos_max = 42130, n_tokens = 42131, size = 62.813 MiB)
slot update_slots: id  0 | task 9514 | prompt processing done, n_tokens = 42643, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 9514 |
prompt eval time =    2773.88 ms /  2724 tokens (    1.02 ms per token,   982.02 tokens per second)
       eval time =    2513.32 ms /    84 tokens (   29.92 ms per token,    33.42 tokens per second)
      total time =    5287.20 ms /  2808 tokens
slot      release: id  0 | task 9514 | stop processing: n_tokens = 42726, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 9600 | processing task, is_child = 0
slot update_slots: id  0 | task 9600 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42757
slot update_slots: id  0 | task 9600 | n_tokens = 42726, memory_seq_rm [42726, end)
slot init_sampler: id  0 | task 9600 | init sampler, took 5.33 ms, tokens: text = 42757, total = 42757
slot update_slots: id  0 | task 9600 | created context checkpoint 5 of 8 (pos_min = 42725, pos_max = 42725, n_tokens = 42726, size = 62.813 MiB)
slot update_slots: id  0 | task 9600 | prompt processing done, n_tokens = 42757, batch.n_tokens = 31
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 9600 |
prompt eval time =     252.44 ms /    31 tokens (    8.14 ms per token,   122.80 tokens per second)
       eval time =    1011.71 ms /    32 tokens (   31.62 ms per token,    31.63 tokens per second)
      total time =    1264.15 ms /    63 tokens
slot      release: id  0 | task 9600 | stop processing: n_tokens = 42788, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.943 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 9633 | processing task, is_child = 0
slot update_slots: id  0 | task 9633 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 45211
slot update_slots: id  0 | task 9633 | n_past = 42639, slot.prompt.tokens.size() = 42788, seq_id = 0, pos_min = 42787, n_swa = 1
slot update_slots: id  0 | task 9633 | restored context checkpoint (pos_min = 42130, pos_max = 42130, n_tokens = 42131, size = 62.813 MiB)
slot update_slots: id  0 | task 9633 | erased invalidated context checkpoint (pos_min = 42725, pos_max = 42725, n_tokens = 42726, n_swa = 1, size = 62.813 MiB)
slot update_slots: id  0 | task 9633 | n_tokens = 42131, memory_seq_rm [42131, end)
slot update_slots: id  0 | task 9633 | prompt processing progress, n_tokens = 44699, batch.n_tokens = 2568, progress = 0.988675
slot update_slots: id  0 | task 9633 | n_tokens = 44699, memory_seq_rm [44699, end)
slot init_sampler: id  0 | task 9633 | init sampler, took 5.50 ms, tokens: text = 45211, total = 45211
slot update_slots: id  0 | task 9633 | created context checkpoint 5 of 8 (pos_min = 44698, pos_max = 44698, n_tokens = 44699, size = 62.813 MiB)
slot update_slots: id  0 | task 9633 | prompt processing done, n_tokens = 45211, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 9633 |
prompt eval time =    3182.90 ms /  3080 tokens (    1.03 ms per token,   967.67 tokens per second)
       eval time =    8843.18 ms /   270 tokens (   32.75 ms per token,    30.53 tokens per second)
      total time =   12026.08 ms /  3350 tokens
slot      release: id  0 | task 9633 | stop processing: n_tokens = 45480, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.994
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 9905 | processing task, is_child = 0
slot update_slots: id  0 | task 9905 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 45491
slot update_slots: id  0 | task 9905 | n_past = 45207, slot.prompt.tokens.size() = 45480, seq_id = 0, pos_min = 45479, n_swa = 1
slot update_slots: id  0 | task 9905 | restored context checkpoint (pos_min = 44698, pos_max = 44698, n_tokens = 44699, size = 62.813 MiB)
slot update_slots: id  0 | task 9905 | n_tokens = 44699, memory_seq_rm [44699, end)
slot update_slots: id  0 | task 9905 | prompt processing progress, n_tokens = 44979, batch.n_tokens = 280, progress = 0.988745
slot update_slots: id  0 | task 9905 | n_tokens = 44979, memory_seq_rm [44979, end)
slot init_sampler: id  0 | task 9905 | init sampler, took 5.69 ms, tokens: text = 45491, total = 45491
slot update_slots: id  0 | task 9905 | created context checkpoint 6 of 8 (pos_min = 44978, pos_max = 44978, n_tokens = 44979, size = 62.813 MiB)
slot update_slots: id  0 | task 9905 | prompt processing done, n_tokens = 45491, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 9905 |
prompt eval time =    1296.90 ms /   792 tokens (    1.64 ms per token,   610.69 tokens per second)
       eval time =     807.06 ms /    27 tokens (   29.89 ms per token,    33.45 tokens per second)
      total time =    2103.96 ms /   819 tokens
slot      release: id  0 | task 9905 | stop processing: n_tokens = 45517, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 9934 | processing task, is_child = 0
slot update_slots: id  0 | task 9934 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 45574
slot update_slots: id  0 | task 9934 | n_tokens = 45517, memory_seq_rm [45517, end)
slot init_sampler: id  0 | task 9934 | init sampler, took 5.46 ms, tokens: text = 45574, total = 45574
slot update_slots: id  0 | task 9934 | created context checkpoint 7 of 8 (pos_min = 45516, pos_max = 45516, n_tokens = 45517, size = 62.813 MiB)
slot update_slots: id  0 | task 9934 | prompt processing done, n_tokens = 45574, batch.n_tokens = 57
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 9934 |
prompt eval time =     371.27 ms /    57 tokens (    6.51 ms per token,   153.53 tokens per second)
       eval time =    6152.88 ms /   200 tokens (   30.76 ms per token,    32.51 tokens per second)
      total time =    6524.15 ms /   257 tokens
slot      release: id  0 | task 9934 | stop processing: n_tokens = 45773, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 0.996
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 10135 | processing task, is_child = 0
slot update_slots: id  0 | task 10135 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 45821
slot update_slots: id  0 | task 10135 | n_past = 45584, slot.prompt.tokens.size() = 45773, seq_id = 0, pos_min = 45772, n_swa = 1
slot update_slots: id  0 | task 10135 | restored context checkpoint (pos_min = 45516, pos_max = 45516, n_tokens = 45517, size = 62.813 MiB)
slot update_slots: id  0 | task 10135 | n_tokens = 45517, memory_seq_rm [45517, end)
slot init_sampler: id  0 | task 10135 | init sampler, took 5.66 ms, tokens: text = 45821, total = 45821
slot update_slots: id  0 | task 10135 | prompt processing done, n_tokens = 45821, batch.n_tokens = 304
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 10135 |
prompt eval time =     581.29 ms /   304 tokens (    1.91 ms per token,   522.97 tokens per second)
       eval time =    5519.43 ms /   176 tokens (   31.36 ms per token,    31.89 tokens per second)
      total time =    6100.72 ms /   480 tokens
slot      release: id  0 | task 10135 | stop processing: n_tokens = 45996, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.993 (> 0.100 thold), f_keep = 0.996
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 10312 | processing task, is_child = 0
slot update_slots: id  0 | task 10312 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46166
slot update_slots: id  0 | task 10312 | n_past = 45831, slot.prompt.tokens.size() = 45996, seq_id = 0, pos_min = 45995, n_swa = 1
slot update_slots: id  0 | task 10312 | restored context checkpoint (pos_min = 45516, pos_max = 45516, n_tokens = 45517, size = 62.813 MiB)
slot update_slots: id  0 | task 10312 | n_tokens = 45517, memory_seq_rm [45517, end)
slot update_slots: id  0 | task 10312 | prompt processing progress, n_tokens = 45654, batch.n_tokens = 137, progress = 0.988910
slot update_slots: id  0 | task 10312 | n_tokens = 45654, memory_seq_rm [45654, end)
slot init_sampler: id  0 | task 10312 | init sampler, took 7.23 ms, tokens: text = 46166, total = 46166
slot update_slots: id  0 | task 10312 | created context checkpoint 8 of 8 (pos_min = 45653, pos_max = 45653, n_tokens = 45654, size = 62.813 MiB)
slot update_slots: id  0 | task 10312 | prompt processing done, n_tokens = 46166, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 10312 |
prompt eval time =    1171.62 ms /   649 tokens (    1.81 ms per token,   553.94 tokens per second)
       eval time =    2241.14 ms /    72 tokens (   31.13 ms per token,    32.13 tokens per second)
      total time =    3412.75 ms /   721 tokens
slot      release: id  0 | task 10312 | stop processing: n_tokens = 46237, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.986 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 10386 | processing task, is_child = 0
slot update_slots: id  0 | task 10386 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46883
slot update_slots: id  0 | task 10386 | n_tokens = 46237, memory_seq_rm [46237, end)
slot update_slots: id  0 | task 10386 | prompt processing progress, n_tokens = 46371, batch.n_tokens = 134, progress = 0.989079
slot update_slots: id  0 | task 10386 | n_tokens = 46371, memory_seq_rm [46371, end)
slot init_sampler: id  0 | task 10386 | init sampler, took 5.71 ms, tokens: text = 46883, total = 46883
slot update_slots: id  0 | task 10386 | erasing old context checkpoint (pos_min = 30885, pos_max = 30885, n_tokens = 30886, size = 62.813 MiB)
slot update_slots: id  0 | task 10386 | created context checkpoint 8 of 8 (pos_min = 46370, pos_max = 46370, n_tokens = 46371, size = 62.813 MiB)
slot update_slots: id  0 | task 10386 | prompt processing done, n_tokens = 46883, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 10386 |
prompt eval time =    1155.07 ms /   646 tokens (    1.79 ms per token,   559.27 tokens per second)
       eval time =    2959.94 ms /    97 tokens (   30.51 ms per token,    32.77 tokens per second)
      total time =    4115.02 ms /   743 tokens
slot      release: id  0 | task 10386 | stop processing: n_tokens = 46979, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 10485 | processing task, is_child = 0
slot update_slots: id  0 | task 10485 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47327
slot update_slots: id  0 | task 10485 | n_past = 46932, slot.prompt.tokens.size() = 46979, seq_id = 0, pos_min = 46978, n_swa = 1
slot update_slots: id  0 | task 10485 | restored context checkpoint (pos_min = 46370, pos_max = 46370, n_tokens = 46371, size = 62.813 MiB)
slot update_slots: id  0 | task 10485 | n_tokens = 46371, memory_seq_rm [46371, end)
slot update_slots: id  0 | task 10485 | prompt processing progress, n_tokens = 46815, batch.n_tokens = 444, progress = 0.989182
slot update_slots: id  0 | task 10485 | n_tokens = 46815, memory_seq_rm [46815, end)
slot init_sampler: id  0 | task 10485 | init sampler, took 5.84 ms, tokens: text = 47327, total = 47327
slot update_slots: id  0 | task 10485 | erasing old context checkpoint (pos_min = 37937, pos_max = 37937, n_tokens = 37938, size = 62.813 MiB)
slot update_slots: id  0 | task 10485 | created context checkpoint 8 of 8 (pos_min = 46814, pos_max = 46814, n_tokens = 46815, size = 62.813 MiB)
slot update_slots: id  0 | task 10485 | prompt processing done, n_tokens = 47327, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 10485 |
prompt eval time =    1384.30 ms /   956 tokens (    1.45 ms per token,   690.60 tokens per second)
       eval time =    1516.38 ms /    48 tokens (   31.59 ms per token,    31.65 tokens per second)
      total time =    2900.68 ms /  1004 tokens
slot      release: id  0 | task 10485 | stop processing: n_tokens = 47374, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 10535 | processing task, is_child = 0
slot update_slots: id  0 | task 10535 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47397
slot update_slots: id  0 | task 10535 | n_tokens = 47374, memory_seq_rm [47374, end)
slot init_sampler: id  0 | task 10535 | init sampler, took 5.84 ms, tokens: text = 47397, total = 47397
slot update_slots: id  0 | task 10535 | erasing old context checkpoint (pos_min = 39918, pos_max = 39918, n_tokens = 39919, size = 62.813 MiB)
slot update_slots: id  0 | task 10535 | created context checkpoint 8 of 8 (pos_min = 47373, pos_max = 47373, n_tokens = 47374, size = 62.813 MiB)
slot update_slots: id  0 | task 10535 | prompt processing done, n_tokens = 47397, batch.n_tokens = 23
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 10535 |
prompt eval time =     228.21 ms /    23 tokens (    9.92 ms per token,   100.79 tokens per second)
       eval time =    2659.06 ms /    88 tokens (   30.22 ms per token,    33.09 tokens per second)
      total time =    2887.27 ms /   111 tokens
slot      release: id  0 | task 10535 | stop processing: n_tokens = 47484, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 10624 | processing task, is_child = 0
slot update_slots: id  0 | task 10624 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47516
slot update_slots: id  0 | task 10624 | n_tokens = 47484, memory_seq_rm [47484, end)
slot init_sampler: id  0 | task 10624 | init sampler, took 6.04 ms, tokens: text = 47516, total = 47516
slot update_slots: id  0 | task 10624 | erasing old context checkpoint (pos_min = 42130, pos_max = 42130, n_tokens = 42131, size = 62.813 MiB)
slot update_slots: id  0 | task 10624 | created context checkpoint 8 of 8 (pos_min = 47483, pos_max = 47483, n_tokens = 47484, size = 62.813 MiB)
slot update_slots: id  0 | task 10624 | prompt processing done, n_tokens = 47516, batch.n_tokens = 32
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 10624 |
prompt eval time =     292.99 ms /    32 tokens (    9.16 ms per token,   109.22 tokens per second)
       eval time =    2433.20 ms /    79 tokens (   30.80 ms per token,    32.47 tokens per second)
      total time =    2726.20 ms /   111 tokens
slot      release: id  0 | task 10624 | stop processing: n_tokens = 47594, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 10704 | processing task, is_child = 0
slot update_slots: id  0 | task 10704 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47643
slot update_slots: id  0 | task 10704 | n_past = 47526, slot.prompt.tokens.size() = 47594, seq_id = 0, pos_min = 47593, n_swa = 1
slot update_slots: id  0 | task 10704 | restored context checkpoint (pos_min = 47483, pos_max = 47483, n_tokens = 47484, size = 62.813 MiB)
slot update_slots: id  0 | task 10704 | n_tokens = 47484, memory_seq_rm [47484, end)
slot init_sampler: id  0 | task 10704 | init sampler, took 6.00 ms, tokens: text = 47643, total = 47643
slot update_slots: id  0 | task 10704 | prompt processing done, n_tokens = 47643, batch.n_tokens = 159
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 10704 |
prompt eval time =     438.83 ms /   159 tokens (    2.76 ms per token,   362.33 tokens per second)
       eval time =    8843.87 ms /   266 tokens (   33.25 ms per token,    30.08 tokens per second)
      total time =    9282.69 ms /   425 tokens
slot      release: id  0 | task 10704 | stop processing: n_tokens = 47908, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.991 (> 0.100 thold), f_keep = 0.995
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 10971 | processing task, is_child = 0
slot update_slots: id  0 | task 10971 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 48099
slot update_slots: id  0 | task 10971 | n_past = 47670, slot.prompt.tokens.size() = 47908, seq_id = 0, pos_min = 47907, n_swa = 1
slot update_slots: id  0 | task 10971 | restored context checkpoint (pos_min = 47483, pos_max = 47483, n_tokens = 47484, size = 62.813 MiB)
slot update_slots: id  0 | task 10971 | n_tokens = 47484, memory_seq_rm [47484, end)
slot update_slots: id  0 | task 10971 | prompt processing progress, n_tokens = 47587, batch.n_tokens = 103, progress = 0.989355
slot update_slots: id  0 | task 10971 | n_tokens = 47587, memory_seq_rm [47587, end)
slot init_sampler: id  0 | task 10971 | init sampler, took 5.86 ms, tokens: text = 48099, total = 48099
slot update_slots: id  0 | task 10971 | erasing old context checkpoint (pos_min = 44698, pos_max = 44698, n_tokens = 44699, size = 62.813 MiB)
slot update_slots: id  0 | task 10971 | created context checkpoint 8 of 8 (pos_min = 47586, pos_max = 47586, n_tokens = 47587, size = 62.813 MiB)
slot update_slots: id  0 | task 10971 | prompt processing done, n_tokens = 48099, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 10971 |
prompt eval time =    1245.32 ms /   615 tokens (    2.02 ms per token,   493.85 tokens per second)
       eval time =    2158.36 ms /    66 tokens (   32.70 ms per token,    30.58 tokens per second)
      total time =    3403.68 ms /   681 tokens
slot      release: id  0 | task 10971 | stop processing: n_tokens = 48164, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 11039 | processing task, is_child = 0
slot update_slots: id  0 | task 11039 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 48332
slot update_slots: id  0 | task 11039 | n_tokens = 48164, memory_seq_rm [48164, end)
slot init_sampler: id  0 | task 11039 | init sampler, took 5.99 ms, tokens: text = 48332, total = 48332
slot update_slots: id  0 | task 11039 | erasing old context checkpoint (pos_min = 44978, pos_max = 44978, n_tokens = 44979, size = 62.813 MiB)
slot update_slots: id  0 | task 11039 | created context checkpoint 8 of 8 (pos_min = 48163, pos_max = 48163, n_tokens = 48164, size = 62.813 MiB)
slot update_slots: id  0 | task 11039 | prompt processing done, n_tokens = 48332, batch.n_tokens = 168
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 11039 |
prompt eval time =     520.48 ms /   168 tokens (    3.10 ms per token,   322.78 tokens per second)
       eval time =    2680.36 ms /    80 tokens (   33.50 ms per token,    29.85 tokens per second)
      total time =    3200.84 ms /   248 tokens
slot      release: id  0 | task 11039 | stop processing: n_tokens = 48411, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 11120 | processing task, is_child = 0
slot update_slots: id  0 | task 11120 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 48444
slot update_slots: id  0 | task 11120 | n_tokens = 48411, memory_seq_rm [48411, end)
slot init_sampler: id  0 | task 11120 | init sampler, took 6.26 ms, tokens: text = 48444, total = 48444
slot update_slots: id  0 | task 11120 | erasing old context checkpoint (pos_min = 45516, pos_max = 45516, n_tokens = 45517, size = 62.813 MiB)
slot update_slots: id  0 | task 11120 | created context checkpoint 8 of 8 (pos_min = 48410, pos_max = 48410, n_tokens = 48411, size = 62.813 MiB)
slot update_slots: id  0 | task 11120 | prompt processing done, n_tokens = 48444, batch.n_tokens = 33
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 11120 |
prompt eval time =     326.44 ms /    33 tokens (    9.89 ms per token,   101.09 tokens per second)
       eval time =   11539.79 ms /   348 tokens (   33.16 ms per token,    30.16 tokens per second)
      total time =   11866.23 ms /   381 tokens
slot      release: id  0 | task 11120 | stop processing: n_tokens = 48791, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 0.993
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 11469 | processing task, is_child = 0
slot update_slots: id  0 | task 11469 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 48825
slot update_slots: id  0 | task 11469 | n_past = 48455, slot.prompt.tokens.size() = 48791, seq_id = 0, pos_min = 48790, n_swa = 1
slot update_slots: id  0 | task 11469 | restored context checkpoint (pos_min = 48410, pos_max = 48410, n_tokens = 48411, size = 62.813 MiB)
slot update_slots: id  0 | task 11469 | n_tokens = 48411, memory_seq_rm [48411, end)
slot init_sampler: id  0 | task 11469 | init sampler, took 5.88 ms, tokens: text = 48825, total = 48825
slot update_slots: id  0 | task 11469 | prompt processing done, n_tokens = 48825, batch.n_tokens = 414
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 11469 |
prompt eval time =     628.66 ms /   414 tokens (    1.52 ms per token,   658.54 tokens per second)
       eval time =    5410.60 ms /   174 tokens (   31.10 ms per token,    32.16 tokens per second)
      total time =    6039.26 ms /   588 tokens
slot      release: id  0 | task 11469 | stop processing: n_tokens = 48998, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.989 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 11644 | processing task, is_child = 0
slot update_slots: id  0 | task 11644 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 49402
slot update_slots: id  0 | task 11644 | n_past = 48847, slot.prompt.tokens.size() = 48998, seq_id = 0, pos_min = 48997, n_swa = 1
slot update_slots: id  0 | task 11644 | restored context checkpoint (pos_min = 48410, pos_max = 48410, n_tokens = 48411, size = 62.813 MiB)
slot update_slots: id  0 | task 11644 | n_tokens = 48411, memory_seq_rm [48411, end)
slot update_slots: id  0 | task 11644 | prompt processing progress, n_tokens = 48890, batch.n_tokens = 479, progress = 0.989636
slot update_slots: id  0 | task 11644 | n_tokens = 48890, memory_seq_rm [48890, end)
slot init_sampler: id  0 | task 11644 | init sampler, took 6.26 ms, tokens: text = 49402, total = 49402
slot update_slots: id  0 | task 11644 | erasing old context checkpoint (pos_min = 45653, pos_max = 45653, n_tokens = 45654, size = 62.813 MiB)
slot update_slots: id  0 | task 11644 | created context checkpoint 8 of 8 (pos_min = 48889, pos_max = 48889, n_tokens = 48890, size = 62.813 MiB)
slot update_slots: id  0 | task 11644 | prompt processing done, n_tokens = 49402, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 11644 |
prompt eval time =    1413.94 ms /   991 tokens (    1.43 ms per token,   700.88 tokens per second)
       eval time =   15042.03 ms /   497 tokens (   30.27 ms per token,    33.04 tokens per second)
      total time =   16455.97 ms /  1488 tokens
slot      release: id  0 | task 11644 | stop processing: n_tokens = 49898, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.990 (> 0.100 thold), f_keep = 0.991
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 12143 | processing task, is_child = 0
slot update_slots: id  0 | task 12143 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 49945
slot update_slots: id  0 | task 12143 | n_past = 49429, slot.prompt.tokens.size() = 49898, seq_id = 0, pos_min = 49897, n_swa = 1
slot update_slots: id  0 | task 12143 | restored context checkpoint (pos_min = 48889, pos_max = 48889, n_tokens = 48890, size = 62.813 MiB)
slot update_slots: id  0 | task 12143 | n_tokens = 48890, memory_seq_rm [48890, end)
slot update_slots: id  0 | task 12143 | prompt processing progress, n_tokens = 49433, batch.n_tokens = 543, progress = 0.989749
slot update_slots: id  0 | task 12143 | n_tokens = 49433, memory_seq_rm [49433, end)
slot init_sampler: id  0 | task 12143 | init sampler, took 6.19 ms, tokens: text = 49945, total = 49945
slot update_slots: id  0 | task 12143 | erasing old context checkpoint (pos_min = 46370, pos_max = 46370, n_tokens = 46371, size = 62.813 MiB)
slot update_slots: id  0 | task 12143 | created context checkpoint 8 of 8 (pos_min = 49432, pos_max = 49432, n_tokens = 49433, size = 62.813 MiB)
slot update_slots: id  0 | task 12143 | prompt processing done, n_tokens = 49945, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 12143 |
prompt eval time =    1421.93 ms /  1055 tokens (    1.35 ms per token,   741.95 tokens per second)
       eval time =    1497.88 ms /    48 tokens (   31.21 ms per token,    32.05 tokens per second)
      total time =    2919.81 ms /  1103 tokens
slot      release: id  0 | task 12143 | stop processing: n_tokens = 49992, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.985 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 12193 | processing task, is_child = 0
slot update_slots: id  0 | task 12193 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 50752
slot update_slots: id  0 | task 12193 | n_tokens = 49992, memory_seq_rm [49992, end)
slot update_slots: id  0 | task 12193 | prompt processing progress, n_tokens = 50240, batch.n_tokens = 248, progress = 0.989912
slot update_slots: id  0 | task 12193 | n_tokens = 50240, memory_seq_rm [50240, end)
slot init_sampler: id  0 | task 12193 | init sampler, took 6.17 ms, tokens: text = 50752, total = 50752
slot update_slots: id  0 | task 12193 | erasing old context checkpoint (pos_min = 46814, pos_max = 46814, n_tokens = 46815, size = 62.813 MiB)
slot update_slots: id  0 | task 12193 | created context checkpoint 8 of 8 (pos_min = 50239, pos_max = 50239, n_tokens = 50240, size = 62.813 MiB)
slot update_slots: id  0 | task 12193 | prompt processing done, n_tokens = 50752, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 12193 |
prompt eval time =    1285.53 ms /   760 tokens (    1.69 ms per token,   591.19 tokens per second)
       eval time =   16873.23 ms /   539 tokens (   31.30 ms per token,    31.94 tokens per second)
      total time =   18158.76 ms /  1299 tokens
slot      release: id  0 | task 12193 | stop processing: n_tokens = 51290, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.989 (> 0.100 thold), f_keep = 0.990
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 12734 | processing task, is_child = 0
slot update_slots: id  0 | task 12734 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 51324
slot update_slots: id  0 | task 12734 | n_past = 50782, slot.prompt.tokens.size() = 51290, seq_id = 0, pos_min = 51289, n_swa = 1
slot update_slots: id  0 | task 12734 | restored context checkpoint (pos_min = 50239, pos_max = 50239, n_tokens = 50240, size = 62.813 MiB)
slot update_slots: id  0 | task 12734 | n_tokens = 50240, memory_seq_rm [50240, end)
slot update_slots: id  0 | task 12734 | prompt processing progress, n_tokens = 50812, batch.n_tokens = 572, progress = 0.990024
slot update_slots: id  0 | task 12734 | n_tokens = 50812, memory_seq_rm [50812, end)
slot init_sampler: id  0 | task 12734 | init sampler, took 6.21 ms, tokens: text = 51324, total = 51324
slot update_slots: id  0 | task 12734 | erasing old context checkpoint (pos_min = 47373, pos_max = 47373, n_tokens = 47374, size = 62.813 MiB)
slot update_slots: id  0 | task 12734 | created context checkpoint 8 of 8 (pos_min = 50811, pos_max = 50811, n_tokens = 50812, size = 62.813 MiB)
slot update_slots: id  0 | task 12734 | prompt processing done, n_tokens = 51324, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 12734 |
prompt eval time =    1465.16 ms /  1084 tokens (    1.35 ms per token,   739.85 tokens per second)
       eval time =    1433.25 ms /    48 tokens (   29.86 ms per token,    33.49 tokens per second)
      total time =    2898.41 ms /  1132 tokens
slot      release: id  0 | task 12734 | stop processing: n_tokens = 51371, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 12784 | processing task, is_child = 0
slot update_slots: id  0 | task 12784 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 51550
slot update_slots: id  0 | task 12784 | n_tokens = 51371, memory_seq_rm [51371, end)
slot init_sampler: id  0 | task 12784 | init sampler, took 6.34 ms, tokens: text = 51550, total = 51550
slot update_slots: id  0 | task 12784 | erasing old context checkpoint (pos_min = 47483, pos_max = 47483, n_tokens = 47484, size = 62.813 MiB)
slot update_slots: id  0 | task 12784 | created context checkpoint 8 of 8 (pos_min = 51370, pos_max = 51370, n_tokens = 51371, size = 62.813 MiB)
slot update_slots: id  0 | task 12784 | prompt processing done, n_tokens = 51550, batch.n_tokens = 179
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 12784 |
prompt eval time =     544.93 ms /   179 tokens (    3.04 ms per token,   328.48 tokens per second)
       eval time =    4531.82 ms /   147 tokens (   30.83 ms per token,    32.44 tokens per second)
      total time =    5076.75 ms /   326 tokens
slot      release: id  0 | task 12784 | stop processing: n_tokens = 51696, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 12932 | processing task, is_child = 0
slot update_slots: id  0 | task 12932 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 51903
slot update_slots: id  0 | task 12932 | n_past = 51577, slot.prompt.tokens.size() = 51696, seq_id = 0, pos_min = 51695, n_swa = 1
slot update_slots: id  0 | task 12932 | restored context checkpoint (pos_min = 51370, pos_max = 51370, n_tokens = 51371, size = 62.813 MiB)
slot update_slots: id  0 | task 12932 | n_tokens = 51371, memory_seq_rm [51371, end)
slot update_slots: id  0 | task 12932 | prompt processing progress, n_tokens = 51391, batch.n_tokens = 20, progress = 0.990135
slot update_slots: id  0 | task 12932 | n_tokens = 51391, memory_seq_rm [51391, end)
slot init_sampler: id  0 | task 12932 | init sampler, took 6.74 ms, tokens: text = 51903, total = 51903
slot update_slots: id  0 | task 12932 | prompt processing done, n_tokens = 51903, batch.n_tokens = 512
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 12932 |
prompt eval time =     859.90 ms /   532 tokens (    1.62 ms per token,   618.68 tokens per second)
       eval time =    1353.72 ms /    44 tokens (   30.77 ms per token,    32.50 tokens per second)
      total time =    2213.62 ms /   576 tokens
slot      release: id  0 | task 12932 | stop processing: n_tokens = 51946, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 12978 | processing task, is_child = 0
slot update_slots: id  0 | task 12978 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52146
slot update_slots: id  0 | task 12978 | n_tokens = 51946, memory_seq_rm [51946, end)
slot init_sampler: id  0 | task 12978 | init sampler, took 6.40 ms, tokens: text = 52146, total = 52146
slot update_slots: id  0 | task 12978 | erasing old context checkpoint (pos_min = 47586, pos_max = 47586, n_tokens = 47587, size = 62.813 MiB)
slot update_slots: id  0 | task 12978 | created context checkpoint 8 of 8 (pos_min = 51945, pos_max = 51945, n_tokens = 51946, size = 62.813 MiB)
slot update_slots: id  0 | task 12978 | prompt processing done, n_tokens = 52146, batch.n_tokens = 200
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 12978 |
prompt eval time =     561.78 ms /   200 tokens (    2.81 ms per token,   356.01 tokens per second)
       eval time =    1659.63 ms /    54 tokens (   30.73 ms per token,    32.54 tokens per second)
      total time =    2221.41 ms /   254 tokens
slot      release: id  0 | task 12978 | stop processing: n_tokens = 52199, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 13033 | processing task, is_child = 0
slot update_slots: id  0 | task 13033 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52222
slot update_slots: id  0 | task 13033 | n_tokens = 52199, memory_seq_rm [52199, end)
slot init_sampler: id  0 | task 13033 | init sampler, took 6.63 ms, tokens: text = 52222, total = 52222
slot update_slots: id  0 | task 13033 | erasing old context checkpoint (pos_min = 48163, pos_max = 48163, n_tokens = 48164, size = 62.813 MiB)
slot update_slots: id  0 | task 13033 | created context checkpoint 8 of 8 (pos_min = 52198, pos_max = 52198, n_tokens = 52199, size = 62.813 MiB)
slot update_slots: id  0 | task 13033 | prompt processing done, n_tokens = 52222, batch.n_tokens = 23
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 13033 |
prompt eval time =     229.13 ms /    23 tokens (    9.96 ms per token,   100.38 tokens per second)
       eval time =    3935.84 ms /   125 tokens (   31.49 ms per token,    31.76 tokens per second)
      total time =    4164.97 ms /   148 tokens
slot      release: id  0 | task 13033 | stop processing: n_tokens = 52346, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 13159 | processing task, is_child = 0
slot update_slots: id  0 | task 13159 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52404
slot update_slots: id  0 | task 13159 | n_tokens = 52346, memory_seq_rm [52346, end)
slot init_sampler: id  0 | task 13159 | init sampler, took 6.36 ms, tokens: text = 52404, total = 52404
slot update_slots: id  0 | task 13159 | erasing old context checkpoint (pos_min = 48410, pos_max = 48410, n_tokens = 48411, size = 62.813 MiB)
slot update_slots: id  0 | task 13159 | created context checkpoint 8 of 8 (pos_min = 52345, pos_max = 52345, n_tokens = 52346, size = 62.813 MiB)
slot update_slots: id  0 | task 13159 | prompt processing done, n_tokens = 52404, batch.n_tokens = 58
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 13159 |
prompt eval time =     362.37 ms /    58 tokens (    6.25 ms per token,   160.06 tokens per second)
       eval time =    1893.70 ms /    59 tokens (   32.10 ms per token,    31.16 tokens per second)
      total time =    2256.07 ms /   117 tokens
slot      release: id  0 | task 13159 | stop processing: n_tokens = 52462, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 13219 | processing task, is_child = 0
slot update_slots: id  0 | task 13219 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52697
slot update_slots: id  0 | task 13219 | n_tokens = 52462, memory_seq_rm [52462, end)
slot init_sampler: id  0 | task 13219 | init sampler, took 6.27 ms, tokens: text = 52697, total = 52697
slot update_slots: id  0 | task 13219 | erasing old context checkpoint (pos_min = 48889, pos_max = 48889, n_tokens = 48890, size = 62.813 MiB)
slot update_slots: id  0 | task 13219 | created context checkpoint 8 of 8 (pos_min = 52461, pos_max = 52461, n_tokens = 52462, size = 62.813 MiB)
slot update_slots: id  0 | task 13219 | prompt processing done, n_tokens = 52697, batch.n_tokens = 235
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 13219 |
prompt eval time =     596.41 ms /   235 tokens (    2.54 ms per token,   394.02 tokens per second)
       eval time =    1704.99 ms /    54 tokens (   31.57 ms per token,    31.67 tokens per second)
      total time =    2301.40 ms /   289 tokens
slot      release: id  0 | task 13219 | stop processing: n_tokens = 52750, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 13274 | processing task, is_child = 0
slot update_slots: id  0 | task 13274 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52771
slot update_slots: id  0 | task 13274 | n_tokens = 52750, memory_seq_rm [52750, end)
slot init_sampler: id  0 | task 13274 | init sampler, took 6.41 ms, tokens: text = 52771, total = 52771
slot update_slots: id  0 | task 13274 | erasing old context checkpoint (pos_min = 49432, pos_max = 49432, n_tokens = 49433, size = 62.813 MiB)
slot update_slots: id  0 | task 13274 | created context checkpoint 8 of 8 (pos_min = 52749, pos_max = 52749, n_tokens = 52750, size = 62.813 MiB)
slot update_slots: id  0 | task 13274 | prompt processing done, n_tokens = 52771, batch.n_tokens = 21
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 13274 |
prompt eval time =     226.14 ms /    21 tokens (   10.77 ms per token,    92.86 tokens per second)
       eval time =     454.11 ms /    15 tokens (   30.27 ms per token,    33.03 tokens per second)
      total time =     680.25 ms /    36 tokens
slot      release: id  0 | task 13274 | stop processing: n_tokens = 52785, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-constructed
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 13290 | processing task, is_child = 0
slot update_slots: id  0 | task 13290 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52806
slot update_slots: id  0 | task 13290 | n_tokens = 52785, memory_seq_rm [52785, end)
slot init_sampler: id  0 | task 13290 | init sampler, took 6.49 ms, tokens: text = 52806, total = 52806
slot update_slots: id  0 | task 13290 | prompt processing done, n_tokens = 52806, batch.n_tokens = 21
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id  0 | task 13290 |
prompt eval time =     204.55 ms /    21 tokens (    9.74 ms per token,   102.66 tokens per second)
       eval time =    8000.84 ms /   249 tokens (   32.13 ms per token,    31.12 tokens per second)
      total time =    8205.38 ms /   270 tokens
slot      release: id  0 | task 13290 | stop processing: n_tokens = 53054, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv  log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
Parameter	Value
GPU	NVIDIA GeForce RTX 5060 Ti
Compute Capability	12.0 (Blackwell)
VRAM Available at Load	~15,094 MiB
CPU Threads	8 (batch: 8, total system: 16)
Context Length (n_ctx)	128,000 tokens
Batch Size (n_batch)	4,096
Flash Attention	Enabled
Model Architecture	qwen35moe (MoE + SSM hybrid)
Quantization	Q4_K_M (4.58 BPW, 18.48 GiB file)
Model Parameters	34.66 B total (256 experts, 8 active per token)
Buffer	Size
CUDA0 model weights	10,670 MiB
CUDA_Host model buffer	7,921 MiB
CPU model buffer	333 MiB
KV Cache (CUDA0)	1,328 MiB
Recurrent State (CUDA0)	63 MiB
CUDA0 compute buffer	1,988 MiB
CUDA_Host compute buffer	1,032 MiB
Metric	Value
Total Requests	71
Total Output Tokens Generated	13,410
Total Prompt Tokens Processed (with KV cache)	124,684
Total Prompt Tokens (hypothetical, no cache)	3,032,651
Total Tokens Processed (with cache)	138,094
Total Tokens (hypothetical, no cache)	3,046,061
Metric	Value
Tokens saved by KV cache	2,907,967
Cache hit rate	95.9%
Compute reduction factor	24.3×
Scenario	Speed
Cold start (full 29K prompt)	~1,484–1,509 tokens/sec
Warm cache (incremental new tokens)	~500–1,100 tokens/sec
Very small incremental batches (<100 tokens)	~100–400 tokens/sec (overhead-dominated)
Metric	Value
Average generation speed	~30–35 tokens/sec
Minimum observed	~29.5 tokens/sec
Maximum observed	~36.1 tokens/sec
Task ID	Prompt Tokens Processed	Full Context Size	Output Tokens	Cache Hit Rate
0	28,872	28,872	64	0.0% (cold)
72	752	29,112	35	97.4%
109	4,067	32,667	34	87.5%
145	1,517	34,217	35	95.6%
182	1,887	36,138	34	94.8%
218	673	36,844	293	98.2%
513	852	37,184	534	97.7%
1049	1,540	38,212	1,066	96.0%
2117	1,905	39,605	59	95.2%
2178	2,144	41,807	1,147	94.9%
3327	1,693	42,988	124	96.1%
3453	802	43,278	576	98.1%
4031	1,442	44,208	183	96.7%
4216	743	44,439	119	98.3%
4337	937	44,864	35	97.9%
4374	1,749	46,647	125	96.3%
4501	670	46,805	111	98.6%
4614	670	46,963	99	98.6%
4715	658	47,109	65	98.6%
4782	607	47,204	62	98.7%
4846	411	47,676	160	99.1%
5007	726	47,991	168	98.5%
5177	29,006	29,006	32	0.0% (invalidated)
5217	1,657	30,151	31	94.5%
5250	1,624	31,263	31	94.8%
5283	562	31,313	29	98.2%
5314	647	31,398	34	97.9%
5350	441	31,872	35	98.6%
5386	1,749	33,655	34	94.8%
5422	1,487	35,175	34	95.8%
5458	1,914	37,122	211	94.8%
5671	190	37,522	204	99.5%
5876	91	37,816	171	99.8%
6048	76	38,062	412	99.8%
6461	7,564	38,450	79	80.3% (partial invalidation)
6543	31	38,559	54	99.9%
6598	17	38,629	667	100.0%
7266	732	39,344	85	98.1%
7353	663	39,495	38	98.3%
7393	23	39,555	31	99.9%
7425	2,493	40,431	1,919	93.8%
9346	2,480	42,399	77	94.2%
9425	78	42,553	88	99.8%
9514	2,724	42,643	84	93.6%
9600	31	42,757	32	99.9%
9633	3,080	45,211	270	93.2%
9905	792	45,491	27	98.3%
9934	57	45,574	200	99.9%
10135	304	45,821	176	99.3%
10312	649	46,166	72	98.6%
10386	646	46,883	97	98.6%
10485	956	47,327	48	98.0%
10535	23	47,397	88	100.0%
10624	32	47,516	79	99.9%
10704	159	47,643	266	99.7%
10971	615	48,099	66	98.7%
11039	168	48,332	80	99.7%
11120	33	48,444	348	99.9%
11469	414	48,825	174	99.2%
11644	991	49,402	497	98.0%
12143	1,055	49,945	48	97.9%
12193	760	50,752	539	98.5%
12734	1,084	51,324	48	97.9%
12784	179	51,550	147	99.7%
12932	532	51,903	44	99.0%
12978	200	52,146	54	99.6%
13033	23	52,222	125	100.0%
13159	58	52,404	59	99.9%
13219	235	52,697	54	99.6%
13274	21	52,771	15	100.0%
13290	21	52,806	249	100.0%
TOTAL	124,684	3,032,651	13,410	95.9%