Skip to content

Instantly share code, notes, and snippets.

@huytd
Created March 4, 2026 05:13
Show Gist options
  • Select an option

  • Save huytd/3a1dd7a6a76fac3b19503f57b76dbe65 to your computer and use it in GitHub Desktop.

Select an option

Save huytd/3a1dd7a6a76fac3b19503f57b76dbe65 to your computer and use it in GitHub Desktop.
Qwen3.5-35B-A3B Q4_K_M Session log and token analysis

Qwen3.5-35B-A3B Inference Session Report

Model: Qwen3.5-35B-A3B-UD-Q4_K_M
Backend: llama.cpp (build 8171, c17dce4f5)
Hardware: NVIDIA GeForce RTX 5060 Ti (16 GB VRAM)
Server: llama-server on http://127.0.0.1:8080
Date: March 3, 2026


1. System Configuration

Parameter Value
GPU NVIDIA GeForce RTX 5060 Ti
Compute Capability 12.0 (Blackwell)
VRAM Available at Load ~15,094 MiB
CPU Threads 8 (batch: 8, total system: 16)
Context Length (n_ctx) 128,000 tokens
Batch Size (n_batch) 4,096
Flash Attention Enabled
Model Architecture qwen35moe (MoE + SSM hybrid)
Quantization Q4_K_M (4.58 BPW, 18.48 GiB file)
Model Parameters 34.66 B total (256 experts, 8 active per token)

Memory Allocation

Buffer Size
CUDA0 model weights 10,670 MiB
CUDA_Host model buffer 7,921 MiB
CPU model buffer 333 MiB
KV Cache (CUDA0) 1,328 MiB
Recurrent State (CUDA0) 63 MiB
CUDA0 compute buffer 1,988 MiB
CUDA_Host compute buffer 1,032 MiB

Note: The model did not fully fit in VRAM. llama.cpp's fit algorithm offloaded 20 of 41 layers to CUDA_Host (pinned RAM), with all 41 layers ultimately GPU-accessible through the hybrid layout.


2. Session Summary

Metric Value
Total Requests 71
Total Output Tokens Generated 13,410
Total Prompt Tokens Processed (with KV cache) 124,684
Total Prompt Tokens (hypothetical, no cache) 3,032,651
Total Tokens Processed (with cache) 138,094
Total Tokens (hypothetical, no cache) 3,046,061

3. KV Cache Performance

The session involved a long, growing multi-turn conversation. llama.cpp's prompt cache system stored up to 8 rolling checkpoints, allowing new requests to skip re-processing the shared prefix.

Metric Value
Tokens saved by KV cache 2,907,967
Cache hit rate 95.9%
Compute reduction factor 24.3×

The cache hit rate was exceptionally high because each request shared the vast majority of its prefix with the previous one — the conversation context grew incrementally from ~29K to ~53K tokens, adding only hundreds to low thousands of new tokens per turn.

Two full cache invalidations occurred during the session (tasks 5177 and 6461), where the LCP (longest common prefix) similarity dropped enough that all cached checkpoints were invalidated and the full prompt had to be reprocessed from scratch. These are visible as large prompt eval times (~19–20 seconds) matching the initial cold-start request.


4. Token Throughput

Prompt Evaluation Speed (Prefill)

Prompt evaluation speed varied based on whether cached tokens were reused:

Scenario Speed
Cold start (full 29K prompt) ~1,484–1,509 tokens/sec
Warm cache (incremental new tokens) ~500–1,100 tokens/sec
Very small incremental batches (<100 tokens) ~100–400 tokens/sec (overhead-dominated)

Cold-start prefill is fast because the large batch (4,096 tokens) saturates GPU throughput. Small incremental batches are slower per-token due to fixed kernel launch overhead.

Token Generation Speed (Decode)

Metric Value
Average generation speed ~30–35 tokens/sec
Minimum observed ~29.5 tokens/sec
Maximum observed ~36.1 tokens/sec

Generation speed was consistent throughout the session, showing no significant degradation even as the KV cache filled up to ~53K tokens. This is expected for a single-sequence, single-slot configuration.


5. Request-by-Request Breakdown

The table below shows every request with its prompt tokens actually processed (with cache), the full context size it was based on, and tokens generated.

Task ID Prompt Tokens Processed Full Context Size Output Tokens Cache Hit Rate
0 28,872 28,872 64 0.0% (cold)
72 752 29,112 35 97.4%
109 4,067 32,667 34 87.5%
145 1,517 34,217 35 95.6%
182 1,887 36,138 34 94.8%
218 673 36,844 293 98.2%
513 852 37,184 534 97.7%
1049 1,540 38,212 1,066 96.0%
2117 1,905 39,605 59 95.2%
2178 2,144 41,807 1,147 94.9%
3327 1,693 42,988 124 96.1%
3453 802 43,278 576 98.1%
4031 1,442 44,208 183 96.7%
4216 743 44,439 119 98.3%
4337 937 44,864 35 97.9%
4374 1,749 46,647 125 96.3%
4501 670 46,805 111 98.6%
4614 670 46,963 99 98.6%
4715 658 47,109 65 98.6%
4782 607 47,204 62 98.7%
4846 411 47,676 160 99.1%
5007 726 47,991 168 98.5%
5177 29,006 29,006 32 0.0% (invalidated)
5217 1,657 30,151 31 94.5%
5250 1,624 31,263 31 94.8%
5283 562 31,313 29 98.2%
5314 647 31,398 34 97.9%
5350 441 31,872 35 98.6%
5386 1,749 33,655 34 94.8%
5422 1,487 35,175 34 95.8%
5458 1,914 37,122 211 94.8%
5671 190 37,522 204 99.5%
5876 91 37,816 171 99.8%
6048 76 38,062 412 99.8%
6461 7,564 38,450 79 80.3% (partial invalidation)
6543 31 38,559 54 99.9%
6598 17 38,629 667 100.0%
7266 732 39,344 85 98.1%
7353 663 39,495 38 98.3%
7393 23 39,555 31 99.9%
7425 2,493 40,431 1,919 93.8%
9346 2,480 42,399 77 94.2%
9425 78 42,553 88 99.8%
9514 2,724 42,643 84 93.6%
9600 31 42,757 32 99.9%
9633 3,080 45,211 270 93.2%
9905 792 45,491 27 98.3%
9934 57 45,574 200 99.9%
10135 304 45,821 176 99.3%
10312 649 46,166 72 98.6%
10386 646 46,883 97 98.6%
10485 956 47,327 48 98.0%
10535 23 47,397 88 100.0%
10624 32 47,516 79 99.9%
10704 159 47,643 266 99.7%
10971 615 48,099 66 98.7%
11039 168 48,332 80 99.7%
11120 33 48,444 348 99.9%
11469 414 48,825 174 99.2%
11644 991 49,402 497 98.0%
12143 1,055 49,945 48 97.9%
12193 760 50,752 539 98.5%
12734 1,084 51,324 48 97.9%
12784 179 51,550 147 99.7%
12932 532 51,903 44 99.0%
12978 200 52,146 54 99.6%
13033 23 52,222 125 100.0%
13159 58 52,404 59 99.9%
13219 235 52,697 54 99.6%
13274 21 52,771 15 100.0%
13290 21 52,806 249 100.0%
TOTAL 124,684 3,032,651 13,410 95.9%

6. Context Growth Over Session

The conversation context grew steadily across two distinct conversation threads visible in the log:

Thread 1 (tasks 0 – 5007): ~29K → ~48K tokens
Thread 2 (tasks 5177 – 5007, new session): ~29K → ~53K tokens (final)

The two full invalidation events at tasks 5177 and 6461 indicate either a topic switch with a substantially different system prompt/context, or a client reconnect that caused llama.cpp to determine it could not reuse its cached state at sufficient similarity (LCP similarity dropped to 0.908 and 0.813 respectively, below the threshold for effective reuse).


7. Notable Observations

  • Blackwell GPU (RTX 5060 Ti): This is one of the first reported inference runs on NVIDIA's Blackwell architecture under llama.cpp, with native FP4 support (BLACKWELL_NATIVE_FP4 = 1) and compute capability 12.0.
  • Hybrid MoE+SSM architecture: Qwen3.5 uses a novel hybrid of MoE (256 experts, 8 active) and SSM (Mamba-style recurrent) layers, reflected in the separate llama_memory_recurrent allocation (62.81 MiB for recurrent state).
  • Checkpoint eviction strategy: llama.cpp maintained a rolling window of 8 checkpoints and evicted the oldest on overflow, which kept cache reuse high while bounding memory overhead to 8 × 62.8 MiB = ~502 MiB.
  • Generation speed stability: Output token generation stayed consistently at ~30–35 tok/s across the entire session despite the context growing from 29K to 53K tokens, suggesting KV cache access time is not significantly impacting generation at this context length.

Report generated from llama-server log output. Token counts extracted from prompt eval time and eval time timing lines, and full context sizes from init sampler lines.

qwen3.5-35b % ls
Qwen3.5-35B-A3B-UD-Q2_K_XL.gguf Qwen3.5-35B-A3B-UD-Q4_K_M.gguf run
qwen3.5-35b % ./run Qwen3.5-35B-A3B-UD-Q4_K_M.gguf
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
build: 8171 (c17dce4f5) with GNU 13.3.0 for Linux x86_64
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | FA_ALL_QUANTS = 1 | BLACKWELL_NATIVE_FP4 = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Running without SSL
init: using 15 threads for HTTP server
start: binding port with default address family
main: loading model
srv load_model: loading model 'Qwen3.5-35B-A3B-UD-Q4_K_M.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 21953 MiB of device memory vs. 15094 MiB of free device memory
llama_params_fit_impl: cannot meet free memory target of 1024 MiB, need to reduce device memory by 7883 MiB
llama_params_fit_impl: context size set by user to 128000 -> no change
llama_params_fit_impl: with only dense weights in device memory there is a total surplus of 9396 MiB
llama_params_fit_impl: filling dense-only layers back-to-front:
llama_params_fit_impl: - CUDA0 (NVIDIA GeForce RTX 5060 Ti): 41 layers, 5105 MiB used, 9988 MiB free
llama_params_fit_impl: converting dense-only layers to full layers and filling them front-to-back with overflow to next device/system memory:
llama_params_fit_impl: - CUDA0 (NVIDIA GeForce RTX 5060 Ti): 41 layers (20 overflowing), 14049 MiB used, 1044 MiB free
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 6.49 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5060 Ti) (0000:01:00.0) - 15158 MiB free
llama_model_loader: loaded meta data with 52 key-value pairs and 733 tensors from Qwen3.5-35B-A3B-UD-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35moe
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
llama_model_loader: - kv 5: general.name str = Qwen3.5-35B-A3B
llama_model_loader: - kv 6: general.basename str = Qwen3.5-35B-A3B
llama_model_loader: - kv 7: general.quantized_by str = Unsloth
llama_model_loader: - kv 8: general.size_label str = 35B-A3B
llama_model_loader: - kv 9: general.license str = apache-2.0
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 12: general.base_model.count u32 = 1
llama_model_loader: - kv 13: general.base_model.0.name str = Qwen3.5 35B A3B
llama_model_loader: - kv 14: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 15: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv 16: general.tags arr[str,2] = ["unsloth", "image-text-to-text"]
llama_model_loader: - kv 17: qwen35moe.block_count u32 = 40
llama_model_loader: - kv 18: qwen35moe.context_length u32 = 262144
llama_model_loader: - kv 19: qwen35moe.embedding_length u32 = 2048
llama_model_loader: - kv 20: qwen35moe.attention.head_count u32 = 16
llama_model_loader: - kv 21: qwen35moe.attention.head_count_kv u32 = 2
llama_model_loader: - kv 22: qwen35moe.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 23: qwen35moe.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 24: qwen35moe.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 25: qwen35moe.expert_count u32 = 256
llama_model_loader: - kv 26: qwen35moe.expert_used_count u32 = 8
llama_model_loader: - kv 27: qwen35moe.attention.key_length u32 = 256
llama_model_loader: - kv 28: qwen35moe.attention.value_length u32 = 256
llama_model_loader: - kv 29: qwen35moe.expert_feed_forward_length u32 = 512
llama_model_loader: - kv 30: qwen35moe.expert_shared_feed_forward_length u32 = 512
llama_model_loader: - kv 31: qwen35moe.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 32: qwen35moe.ssm.state_size u32 = 128
llama_model_loader: - kv 33: qwen35moe.ssm.group_count u32 = 16
llama_model_loader: - kv 34: qwen35moe.ssm.time_step_rank u32 = 32
llama_model_loader: - kv 35: qwen35moe.ssm.inner_size u32 = 4096
llama_model_loader: - kv 36: qwen35moe.full_attention_interval u32 = 4
llama_model_loader: - kv 37: qwen35moe.rope.dimension_count u32 = 64
llama_model_loader: - kv 38: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 39: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 40: tokenizer.ggml.tokens arr[str,248320] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 41: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 42: tokenizer.ggml.merges arr[str,247587] = ["Ä Ä ", "Ä Ä Ä Ä ", "i n", "Ä t",...
llama_model_loader: - kv 43: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 44: tokenizer.ggml.padding_token_id u32 = 248055
llama_model_loader: - kv 45: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 46: general.quantization_version u32 = 2
llama_model_loader: - kv 47: general.file_type u32 = 7
llama_model_loader: - kv 48: quantize.imatrix.file str = Qwen3.5-35B-A3B-GGUF/Qwen_Qwen3.5-35B...
llama_model_loader: - kv 49: quantize.imatrix.dataset str = /training_dir/calibration_datav5.txt
llama_model_loader: - kv 50: quantize.imatrix.entries_count u32 = 510
llama_model_loader: - kv 51: quantize.imatrix.chunks_count u32 = 802
llama_model_loader: - type f32: 301 tensors
llama_model_loader: - type q4_K: 190 tensors
llama_model_loader: - type q5_K: 82 tensors
llama_model_loader: - type q6_K: 80 tensors
llama_model_loader: - type bf16: 80 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 18.48 GiB (4.58 BPW)
load: 0 unused tokens
load: printing all EOG tokens:
load: - 248044 ('<|endoftext|>')
load: - 248046 ('<|im_end|>')
load: - 248063 ('<|fim_pad|>')
load: - 248064 ('<|repo_name|>')
load: - 248065 ('<|file_sep|>')
load: special tokens cache size = 33
load: token to piece cache size = 1.7581 MB
print_info: arch = qwen35moe
print_info: vocab_only = 0
print_info: no_alloc = 0
print_info: n_ctx_train = 262144
print_info: n_embd = 2048
print_info: n_embd_inp = 2048
print_info: n_layer = 40
print_info: n_head = 16
print_info: n_head_kv = 2
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 256
print_info: n_embd_head_v = 256
print_info: n_gqa = 8
print_info: n_embd_k_gqa = 512
print_info: n_embd_v_gqa = 512
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 0
print_info: n_expert = 256
print_info: n_expert_used = 8
print_info: n_expert_groups = 0
print_info: n_group_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 40
print_info: rope scaling = linear
print_info: freq_base_train = 10000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 262144
print_info: rope_yarn_log_mul = 0.0000
print_info: rope_finetuned = unknown
print_info: mrope sections = [11, 11, 10, 0]
print_info: ssm_d_conv = 4
print_info: ssm_d_inner = 4096
print_info: ssm_d_state = 128
print_info: ssm_dt_rank = 32
print_info: ssm_n_group = 16
print_info: ssm_dt_b_c_rms = 0
print_info: model type = ?B
print_info: model params = 34.66 B
print_info: general.name = Qwen3.5-35B-A3B
print_info: vocab type = BPE
print_info: n_vocab = 248320
print_info: n_merges = 247587
print_info: BOS token = 11 ','
print_info: EOS token = 248046 '<|im_end|>'
print_info: EOT token = 248046 '<|im_end|>'
print_info: PAD token = 248055 '<|vision_pad|>'
print_info: LF token = 198 'ÄŠ'
print_info: FIM PRE token = 248060 '<|fim_prefix|>'
print_info: FIM SUF token = 248062 '<|fim_suffix|>'
print_info: FIM MID token = 248061 '<|fim_middle|>'
print_info: FIM PAD token = 248063 '<|fim_pad|>'
print_info: FIM REP token = 248064 '<|repo_name|>'
print_info: FIM SEP token = 248065 '<|file_sep|>'
print_info: EOG token = 248044 '<|endoftext|>'
print_info: EOG token = 248046 '<|im_end|>'
print_info: EOG token = 248063 '<|fim_pad|>'
print_info: EOG token = 248064 '<|repo_name|>'
print_info: EOG token = 248065 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 39 repeating layers to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors: CPU model buffer size = 333.44 MiB
load_tensors: CUDA0 model buffer size = 10670.30 MiB
load_tensors: CUDA_Host model buffer size = 7920.69 MiB
...................................................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
common_init_result: added <|fim_pad|> logit bias = -inf
common_init_result: added <|repo_name|> logit bias = -inf
common_init_result: added <|file_sep|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 128000
llama_context: n_ctx_seq = 128000
llama_context: n_batch = 4096
llama_context: n_ubatch = 2048
llama_context: causal_attn = 1
llama_context: flash_attn = enabled
llama_context: kv_unified = false
llama_context: freq_base = 10000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_seq (128000) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
llama_context: CUDA_Host output buffer size = 0.95 MiB
llama_kv_cache: CUDA0 KV buffer size = 1328.12 MiB
llama_kv_cache: size = 1328.12 MiB (128000 cells, 10 layers, 1/1 seqs), K (q8_0): 664.06 MiB, V (q8_0): 664.06 MiB
llama_memory_recurrent: CUDA0 RS buffer size = 62.81 MiB
llama_memory_recurrent: size = 62.81 MiB ( 1 cells, 40 layers, 1 seqs), R (f32): 2.81 MiB, S (f32): 60.00 MiB
sched_reserve: reserving ...
sched_reserve: CUDA0 compute buffer size = 1988.01 MiB
sched_reserve: CUDA_Host compute buffer size = 1032.08 MiB
sched_reserve: graph nodes = 20199 (with bs=2048), 4389 (with bs=1)
sched_reserve: graph splits = 58 (with bs=2048), 42 (with bs=1)
sched_reserve: reserve took 1286.57 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
srv load_model: initializing slots, n_slots = 1
common_speculative_is_compat: the target context does not support partial sequence removal
srv load_model: speculative decoding not supported by this context
slot load_model: id 0 | task -1 | new slot, n_ctx = 128000
srv load_model: prompt cache is enabled, size limit: 8192 MiB
srv load_model: use `--cache-ram 0` to disable the prompt cache
srv load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
init: chat template, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
<think>
</think>
'
srv init: init: chat template, thinking = 1
main: model loaded
main: server is listening on http://127.0.0.1:8080
main: starting the main loop...
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 0 | processing task, is_child = 0
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 28872
slot update_slots: id 0 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 4096, progress = 0.141868
slot update_slots: id 0 | task 0 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 4096, progress = 0.283735
slot update_slots: id 0 | task 0 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 4096, progress = 0.425603
slot update_slots: id 0 | task 0 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 4096, progress = 0.567470
slot update_slots: id 0 | task 0 | n_tokens = 16384, memory_seq_rm [16384, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 20480, batch.n_tokens = 4096, progress = 0.709338
slot update_slots: id 0 | task 0 | n_tokens = 20480, memory_seq_rm [20480, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 24576, batch.n_tokens = 4096, progress = 0.851205
slot update_slots: id 0 | task 0 | n_tokens = 24576, memory_seq_rm [24576, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 28360, batch.n_tokens = 3784, progress = 0.982267
slot update_slots: id 0 | task 0 | n_tokens = 28360, memory_seq_rm [28360, end)
slot init_sampler: id 0 | task 0 | init sampler, took 3.74 ms, tokens: text = 28872, total = 28872
slot update_slots: id 0 | task 0 | created context checkpoint 1 of 8 (pos_min = 28359, pos_max = 28359, n_tokens = 28360, size = 62.813 MiB)
slot update_slots: id 0 | task 0 | prompt processing done, n_tokens = 28872, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 0 |
prompt eval time = 19456.51 ms / 28872 tokens ( 0.67 ms per token, 1483.92 tokens per second)
eval time = 1819.38 ms / 64 tokens ( 28.43 ms per token, 35.18 tokens per second)
total time = 21275.89 ms / 28936 tokens
slot release: id 0 | task 0 | stop processing: n_tokens = 28935, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 72 | processing task, is_child = 0
slot update_slots: id 0 | task 72 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 29112
slot update_slots: id 0 | task 72 | n_past = 28924, slot.prompt.tokens.size() = 28935, seq_id = 0, pos_min = 28934, n_swa = 1
slot update_slots: id 0 | task 72 | restored context checkpoint (pos_min = 28359, pos_max = 28359, n_tokens = 28360, size = 62.813 MiB)
slot update_slots: id 0 | task 72 | n_tokens = 28360, memory_seq_rm [28360, end)
slot update_slots: id 0 | task 72 | prompt processing progress, n_tokens = 28600, batch.n_tokens = 240, progress = 0.982413
slot update_slots: id 0 | task 72 | n_tokens = 28600, memory_seq_rm [28600, end)
slot init_sampler: id 0 | task 72 | init sampler, took 3.55 ms, tokens: text = 29112, total = 29112
slot update_slots: id 0 | task 72 | created context checkpoint 2 of 8 (pos_min = 28599, pos_max = 28599, n_tokens = 28600, size = 62.813 MiB)
slot update_slots: id 0 | task 72 | prompt processing done, n_tokens = 29112, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 72 |
prompt eval time = 1112.74 ms / 752 tokens ( 1.48 ms per token, 675.81 tokens per second)
eval time = 1003.99 ms / 35 tokens ( 28.69 ms per token, 34.86 tokens per second)
total time = 2116.73 ms / 787 tokens
slot release: id 0 | task 72 | stop processing: n_tokens = 29146, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.892 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 109 | processing task, is_child = 0
slot update_slots: id 0 | task 109 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 32667
slot update_slots: id 0 | task 109 | n_past = 29145, slot.prompt.tokens.size() = 29146, seq_id = 0, pos_min = 29145, n_swa = 1
slot update_slots: id 0 | task 109 | restored context checkpoint (pos_min = 28599, pos_max = 28599, n_tokens = 28600, size = 62.813 MiB)
slot update_slots: id 0 | task 109 | n_tokens = 28600, memory_seq_rm [28600, end)
slot update_slots: id 0 | task 109 | prompt processing progress, n_tokens = 32155, batch.n_tokens = 3555, progress = 0.984327
slot update_slots: id 0 | task 109 | n_tokens = 32155, memory_seq_rm [32155, end)
slot init_sampler: id 0 | task 109 | init sampler, took 3.85 ms, tokens: text = 32667, total = 32667
slot update_slots: id 0 | task 109 | created context checkpoint 3 of 8 (pos_min = 32154, pos_max = 32154, n_tokens = 32155, size = 62.813 MiB)
slot update_slots: id 0 | task 109 | prompt processing done, n_tokens = 32667, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 109 |
prompt eval time = 3297.57 ms / 4067 tokens ( 0.81 ms per token, 1233.33 tokens per second)
eval time = 968.07 ms / 34 tokens ( 28.47 ms per token, 35.12 tokens per second)
total time = 4265.64 ms / 4101 tokens
slot release: id 0 | task 109 | stop processing: n_tokens = 32700, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.956 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 145 | processing task, is_child = 0
slot update_slots: id 0 | task 145 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 34217
slot update_slots: id 0 | task 145 | n_tokens = 32700, memory_seq_rm [32700, end)
slot update_slots: id 0 | task 145 | prompt processing progress, n_tokens = 33705, batch.n_tokens = 1005, progress = 0.985037
slot update_slots: id 0 | task 145 | n_tokens = 33705, memory_seq_rm [33705, end)
slot init_sampler: id 0 | task 145 | init sampler, took 4.22 ms, tokens: text = 34217, total = 34217
slot update_slots: id 0 | task 145 | created context checkpoint 4 of 8 (pos_min = 33704, pos_max = 33704, n_tokens = 33705, size = 62.813 MiB)
slot update_slots: id 0 | task 145 | prompt processing done, n_tokens = 34217, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 145 |
prompt eval time = 1630.48 ms / 1517 tokens ( 1.07 ms per token, 930.40 tokens per second)
eval time = 1012.78 ms / 35 tokens ( 28.94 ms per token, 34.56 tokens per second)
total time = 2643.25 ms / 1552 tokens
slot release: id 0 | task 145 | stop processing: n_tokens = 34251, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.948 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 182 | processing task, is_child = 0
slot update_slots: id 0 | task 182 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 36138
slot update_slots: id 0 | task 182 | n_tokens = 34251, memory_seq_rm [34251, end)
slot update_slots: id 0 | task 182 | prompt processing progress, n_tokens = 35626, batch.n_tokens = 1375, progress = 0.985832
slot update_slots: id 0 | task 182 | n_tokens = 35626, memory_seq_rm [35626, end)
slot init_sampler: id 0 | task 182 | init sampler, took 4.34 ms, tokens: text = 36138, total = 36138
slot update_slots: id 0 | task 182 | created context checkpoint 5 of 8 (pos_min = 35625, pos_max = 35625, n_tokens = 35626, size = 62.813 MiB)
slot update_slots: id 0 | task 182 | prompt processing done, n_tokens = 36138, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 182 |
prompt eval time = 1857.75 ms / 1887 tokens ( 0.98 ms per token, 1015.75 tokens per second)
eval time = 966.94 ms / 34 tokens ( 28.44 ms per token, 35.16 tokens per second)
total time = 2824.69 ms / 1921 tokens
slot release: id 0 | task 182 | stop processing: n_tokens = 36171, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.982 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 218 | processing task, is_child = 0
slot update_slots: id 0 | task 218 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 36844
slot update_slots: id 0 | task 218 | n_tokens = 36171, memory_seq_rm [36171, end)
slot update_slots: id 0 | task 218 | prompt processing progress, n_tokens = 36332, batch.n_tokens = 161, progress = 0.986104
slot update_slots: id 0 | task 218 | n_tokens = 36332, memory_seq_rm [36332, end)
slot init_sampler: id 0 | task 218 | init sampler, took 4.45 ms, tokens: text = 36844, total = 36844
slot update_slots: id 0 | task 218 | created context checkpoint 6 of 8 (pos_min = 36331, pos_max = 36331, n_tokens = 36332, size = 62.813 MiB)
slot update_slots: id 0 | task 218 | prompt processing done, n_tokens = 36844, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 218 |
prompt eval time = 1130.78 ms / 673 tokens ( 1.68 ms per token, 595.17 tokens per second)
eval time = 8718.12 ms / 293 tokens ( 29.75 ms per token, 33.61 tokens per second)
total time = 9848.90 ms / 966 tokens
slot release: id 0 | task 218 | stop processing: n_tokens = 37136, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 0.993
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 513 | processing task, is_child = 0
slot update_slots: id 0 | task 513 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 37184
slot update_slots: id 0 | task 513 | n_past = 36872, slot.prompt.tokens.size() = 37136, seq_id = 0, pos_min = 37135, n_swa = 1
slot update_slots: id 0 | task 513 | restored context checkpoint (pos_min = 36331, pos_max = 36331, n_tokens = 36332, size = 62.813 MiB)
slot update_slots: id 0 | task 513 | n_tokens = 36332, memory_seq_rm [36332, end)
slot update_slots: id 0 | task 513 | prompt processing progress, n_tokens = 36672, batch.n_tokens = 340, progress = 0.986231
slot update_slots: id 0 | task 513 | n_tokens = 36672, memory_seq_rm [36672, end)
slot init_sampler: id 0 | task 513 | init sampler, took 4.49 ms, tokens: text = 37184, total = 37184
slot update_slots: id 0 | task 513 | created context checkpoint 7 of 8 (pos_min = 36671, pos_max = 36671, n_tokens = 36672, size = 62.813 MiB)
slot update_slots: id 0 | task 513 | prompt processing done, n_tokens = 37184, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 513 |
prompt eval time = 1427.06 ms / 852 tokens ( 1.67 ms per token, 597.03 tokens per second)
eval time = 16227.11 ms / 534 tokens ( 30.39 ms per token, 32.91 tokens per second)
total time = 17654.17 ms / 1386 tokens
slot release: id 0 | task 513 | stop processing: n_tokens = 37717, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.973 (> 0.100 thold), f_keep = 0.986
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 1049 | processing task, is_child = 0
slot update_slots: id 0 | task 1049 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38212
slot update_slots: id 0 | task 1049 | n_past = 37194, slot.prompt.tokens.size() = 37717, seq_id = 0, pos_min = 37716, n_swa = 1
slot update_slots: id 0 | task 1049 | restored context checkpoint (pos_min = 36671, pos_max = 36671, n_tokens = 36672, size = 62.813 MiB)
slot update_slots: id 0 | task 1049 | n_tokens = 36672, memory_seq_rm [36672, end)
slot update_slots: id 0 | task 1049 | prompt processing progress, n_tokens = 37700, batch.n_tokens = 1028, progress = 0.986601
slot update_slots: id 0 | task 1049 | n_tokens = 37700, memory_seq_rm [37700, end)
slot init_sampler: id 0 | task 1049 | init sampler, took 4.63 ms, tokens: text = 38212, total = 38212
slot update_slots: id 0 | task 1049 | created context checkpoint 8 of 8 (pos_min = 37699, pos_max = 37699, n_tokens = 37700, size = 62.813 MiB)
slot update_slots: id 0 | task 1049 | prompt processing done, n_tokens = 38212, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 1049 |
prompt eval time = 1723.51 ms / 1540 tokens ( 1.12 ms per token, 893.52 tokens per second)
eval time = 31957.27 ms / 1066 tokens ( 29.98 ms per token, 33.36 tokens per second)
total time = 33680.78 ms / 2606 tokens
slot release: id 0 | task 1049 | stop processing: n_tokens = 39277, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.966 (> 0.100 thold), f_keep = 0.974
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 2117 | processing task, is_child = 0
slot update_slots: id 0 | task 2117 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 39605
slot update_slots: id 0 | task 2117 | n_past = 38239, slot.prompt.tokens.size() = 39277, seq_id = 0, pos_min = 39276, n_swa = 1
slot update_slots: id 0 | task 2117 | restored context checkpoint (pos_min = 37699, pos_max = 37699, n_tokens = 37700, size = 62.813 MiB)
slot update_slots: id 0 | task 2117 | n_tokens = 37700, memory_seq_rm [37700, end)
slot update_slots: id 0 | task 2117 | prompt processing progress, n_tokens = 39093, batch.n_tokens = 1393, progress = 0.987072
slot update_slots: id 0 | task 2117 | n_tokens = 39093, memory_seq_rm [39093, end)
slot init_sampler: id 0 | task 2117 | init sampler, took 4.77 ms, tokens: text = 39605, total = 39605
slot update_slots: id 0 | task 2117 | erasing old context checkpoint (pos_min = 28359, pos_max = 28359, n_tokens = 28360, size = 62.813 MiB)
slot update_slots: id 0 | task 2117 | created context checkpoint 8 of 8 (pos_min = 39092, pos_max = 39092, n_tokens = 39093, size = 62.813 MiB)
slot update_slots: id 0 | task 2117 | prompt processing done, n_tokens = 39605, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 2117 |
prompt eval time = 1990.02 ms / 1905 tokens ( 1.04 ms per token, 957.28 tokens per second)
eval time = 1762.69 ms / 59 tokens ( 29.88 ms per token, 33.47 tokens per second)
total time = 3752.71 ms / 1964 tokens
slot release: id 0 | task 2117 | stop processing: n_tokens = 39663, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.949 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 2178 | processing task, is_child = 0
slot update_slots: id 0 | task 2178 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 41807
slot update_slots: id 0 | task 2178 | n_tokens = 39663, memory_seq_rm [39663, end)
slot update_slots: id 0 | task 2178 | prompt processing progress, n_tokens = 41295, batch.n_tokens = 1632, progress = 0.987753
slot update_slots: id 0 | task 2178 | n_tokens = 41295, memory_seq_rm [41295, end)
slot init_sampler: id 0 | task 2178 | init sampler, took 4.98 ms, tokens: text = 41807, total = 41807
slot update_slots: id 0 | task 2178 | erasing old context checkpoint (pos_min = 28599, pos_max = 28599, n_tokens = 28600, size = 62.813 MiB)
slot update_slots: id 0 | task 2178 | created context checkpoint 8 of 8 (pos_min = 41294, pos_max = 41294, n_tokens = 41295, size = 62.813 MiB)
slot update_slots: id 0 | task 2178 | prompt processing done, n_tokens = 41807, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 2178 |
prompt eval time = 2061.89 ms / 2144 tokens ( 0.96 ms per token, 1039.82 tokens per second)
eval time = 38943.47 ms / 1147 tokens ( 33.95 ms per token, 29.45 tokens per second)
total time = 41005.35 ms / 3291 tokens
slot release: id 0 | task 2178 | stop processing: n_tokens = 42953, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 3327 | processing task, is_child = 0
slot update_slots: id 0 | task 3327 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42988
slot update_slots: id 0 | task 3327 | n_past = 42943, slot.prompt.tokens.size() = 42953, seq_id = 0, pos_min = 42952, n_swa = 1
slot update_slots: id 0 | task 3327 | restored context checkpoint (pos_min = 41294, pos_max = 41294, n_tokens = 41295, size = 62.813 MiB)
slot update_slots: id 0 | task 3327 | n_tokens = 41295, memory_seq_rm [41295, end)
slot update_slots: id 0 | task 3327 | prompt processing progress, n_tokens = 42476, batch.n_tokens = 1181, progress = 0.988090
slot update_slots: id 0 | task 3327 | n_tokens = 42476, memory_seq_rm [42476, end)
slot init_sampler: id 0 | task 3327 | init sampler, took 5.37 ms, tokens: text = 42988, total = 42988
slot update_slots: id 0 | task 3327 | erasing old context checkpoint (pos_min = 32154, pos_max = 32154, n_tokens = 32155, size = 62.813 MiB)
slot update_slots: id 0 | task 3327 | created context checkpoint 8 of 8 (pos_min = 42475, pos_max = 42475, n_tokens = 42476, size = 62.813 MiB)
slot update_slots: id 0 | task 3327 | prompt processing done, n_tokens = 42988, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 3327 |
prompt eval time = 1846.27 ms / 1693 tokens ( 1.09 ms per token, 916.98 tokens per second)
eval time = 3719.40 ms / 124 tokens ( 30.00 ms per token, 33.34 tokens per second)
total time = 5565.68 ms / 1817 tokens
slot release: id 0 | task 3327 | stop processing: n_tokens = 43111, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 3453 | processing task, is_child = 0
slot update_slots: id 0 | task 3453 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 43278
slot update_slots: id 0 | task 3453 | n_past = 43013, slot.prompt.tokens.size() = 43111, seq_id = 0, pos_min = 43110, n_swa = 1
slot update_slots: id 0 | task 3453 | restored context checkpoint (pos_min = 42475, pos_max = 42475, n_tokens = 42476, size = 62.813 MiB)
slot update_slots: id 0 | task 3453 | n_tokens = 42476, memory_seq_rm [42476, end)
slot update_slots: id 0 | task 3453 | prompt processing progress, n_tokens = 42766, batch.n_tokens = 290, progress = 0.988169
slot update_slots: id 0 | task 3453 | n_tokens = 42766, memory_seq_rm [42766, end)
slot init_sampler: id 0 | task 3453 | init sampler, took 5.19 ms, tokens: text = 43278, total = 43278
slot update_slots: id 0 | task 3453 | erasing old context checkpoint (pos_min = 33704, pos_max = 33704, n_tokens = 33705, size = 62.813 MiB)
slot update_slots: id 0 | task 3453 | created context checkpoint 8 of 8 (pos_min = 42765, pos_max = 42765, n_tokens = 42766, size = 62.813 MiB)
slot update_slots: id 0 | task 3453 | prompt processing done, n_tokens = 43278, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 3453 |
prompt eval time = 1348.00 ms / 802 tokens ( 1.68 ms per token, 594.96 tokens per second)
eval time = 17470.58 ms / 576 tokens ( 30.33 ms per token, 32.97 tokens per second)
total time = 18818.58 ms / 1378 tokens
slot release: id 0 | task 3453 | stop processing: n_tokens = 43853, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.979 (> 0.100 thold), f_keep = 0.987
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4031 | processing task, is_child = 0
slot update_slots: id 0 | task 4031 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 44208
slot update_slots: id 0 | task 4031 | n_past = 43289, slot.prompt.tokens.size() = 43853, seq_id = 0, pos_min = 43852, n_swa = 1
slot update_slots: id 0 | task 4031 | restored context checkpoint (pos_min = 42765, pos_max = 42765, n_tokens = 42766, size = 62.813 MiB)
slot update_slots: id 0 | task 4031 | n_tokens = 42766, memory_seq_rm [42766, end)
slot update_slots: id 0 | task 4031 | prompt processing progress, n_tokens = 43696, batch.n_tokens = 930, progress = 0.988418
slot update_slots: id 0 | task 4031 | n_tokens = 43696, memory_seq_rm [43696, end)
slot init_sampler: id 0 | task 4031 | init sampler, took 5.31 ms, tokens: text = 44208, total = 44208
slot update_slots: id 0 | task 4031 | erasing old context checkpoint (pos_min = 35625, pos_max = 35625, n_tokens = 35626, size = 62.813 MiB)
slot update_slots: id 0 | task 4031 | created context checkpoint 8 of 8 (pos_min = 43695, pos_max = 43695, n_tokens = 43696, size = 62.813 MiB)
slot update_slots: id 0 | task 4031 | prompt processing done, n_tokens = 44208, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4031 |
prompt eval time = 1702.97 ms / 1442 tokens ( 1.18 ms per token, 846.76 tokens per second)
eval time = 5446.43 ms / 183 tokens ( 29.76 ms per token, 33.60 tokens per second)
total time = 7149.40 ms / 1625 tokens
slot release: id 0 | task 4031 | stop processing: n_tokens = 44390, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 0.996
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4216 | processing task, is_child = 0
slot update_slots: id 0 | task 4216 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 44439
slot update_slots: id 0 | task 4216 | n_past = 44231, slot.prompt.tokens.size() = 44390, seq_id = 0, pos_min = 44389, n_swa = 1
slot update_slots: id 0 | task 4216 | restored context checkpoint (pos_min = 43695, pos_max = 43695, n_tokens = 43696, size = 62.813 MiB)
slot update_slots: id 0 | task 4216 | n_tokens = 43696, memory_seq_rm [43696, end)
slot update_slots: id 0 | task 4216 | prompt processing progress, n_tokens = 43927, batch.n_tokens = 231, progress = 0.988479
slot update_slots: id 0 | task 4216 | n_tokens = 43927, memory_seq_rm [43927, end)
slot init_sampler: id 0 | task 4216 | init sampler, took 5.51 ms, tokens: text = 44439, total = 44439
slot update_slots: id 0 | task 4216 | erasing old context checkpoint (pos_min = 36331, pos_max = 36331, n_tokens = 36332, size = 62.813 MiB)
slot update_slots: id 0 | task 4216 | created context checkpoint 8 of 8 (pos_min = 43926, pos_max = 43926, n_tokens = 43927, size = 62.813 MiB)
slot update_slots: id 0 | task 4216 | prompt processing done, n_tokens = 44439, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4216 |
prompt eval time = 1313.63 ms / 743 tokens ( 1.77 ms per token, 565.61 tokens per second)
eval time = 3646.65 ms / 119 tokens ( 30.64 ms per token, 32.63 tokens per second)
total time = 4960.28 ms / 862 tokens
slot release: id 0 | task 4216 | stop processing: n_tokens = 44557, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.991 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4337 | processing task, is_child = 0
slot update_slots: id 0 | task 4337 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 44864
slot update_slots: id 0 | task 4337 | n_past = 44450, slot.prompt.tokens.size() = 44557, seq_id = 0, pos_min = 44556, n_swa = 1
slot update_slots: id 0 | task 4337 | restored context checkpoint (pos_min = 43926, pos_max = 43926, n_tokens = 43927, size = 62.813 MiB)
slot update_slots: id 0 | task 4337 | n_tokens = 43927, memory_seq_rm [43927, end)
slot update_slots: id 0 | task 4337 | prompt processing progress, n_tokens = 44352, batch.n_tokens = 425, progress = 0.988588
slot update_slots: id 0 | task 4337 | n_tokens = 44352, memory_seq_rm [44352, end)
slot init_sampler: id 0 | task 4337 | init sampler, took 5.33 ms, tokens: text = 44864, total = 44864
slot update_slots: id 0 | task 4337 | erasing old context checkpoint (pos_min = 36671, pos_max = 36671, n_tokens = 36672, size = 62.813 MiB)
slot update_slots: id 0 | task 4337 | created context checkpoint 8 of 8 (pos_min = 44351, pos_max = 44351, n_tokens = 44352, size = 62.813 MiB)
slot update_slots: id 0 | task 4337 | prompt processing done, n_tokens = 44864, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4337 |
prompt eval time = 1407.23 ms / 937 tokens ( 1.50 ms per token, 665.85 tokens per second)
eval time = 1069.92 ms / 35 tokens ( 30.57 ms per token, 32.71 tokens per second)
total time = 2477.15 ms / 972 tokens
slot release: id 0 | task 4337 | stop processing: n_tokens = 44898, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.963 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4374 | processing task, is_child = 0
slot update_slots: id 0 | task 4374 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46647
slot update_slots: id 0 | task 4374 | n_tokens = 44898, memory_seq_rm [44898, end)
slot update_slots: id 0 | task 4374 | prompt processing progress, n_tokens = 46135, batch.n_tokens = 1237, progress = 0.989024
slot update_slots: id 0 | task 4374 | n_tokens = 46135, memory_seq_rm [46135, end)
slot init_sampler: id 0 | task 4374 | init sampler, took 7.33 ms, tokens: text = 46647, total = 46647
slot update_slots: id 0 | task 4374 | erasing old context checkpoint (pos_min = 37699, pos_max = 37699, n_tokens = 37700, size = 62.813 MiB)
slot update_slots: id 0 | task 4374 | created context checkpoint 8 of 8 (pos_min = 46134, pos_max = 46134, n_tokens = 46135, size = 62.813 MiB)
slot update_slots: id 0 | task 4374 | prompt processing done, n_tokens = 46647, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4374 |
prompt eval time = 1863.92 ms / 1749 tokens ( 1.07 ms per token, 938.34 tokens per second)
eval time = 3754.11 ms / 125 tokens ( 30.03 ms per token, 33.30 tokens per second)
total time = 5618.03 ms / 1874 tokens
slot release: id 0 | task 4374 | stop processing: n_tokens = 46771, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4501 | processing task, is_child = 0
slot update_slots: id 0 | task 4501 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46805
slot update_slots: id 0 | task 4501 | n_past = 46686, slot.prompt.tokens.size() = 46771, seq_id = 0, pos_min = 46770, n_swa = 1
slot update_slots: id 0 | task 4501 | restored context checkpoint (pos_min = 46134, pos_max = 46134, n_tokens = 46135, size = 62.813 MiB)
slot update_slots: id 0 | task 4501 | n_tokens = 46135, memory_seq_rm [46135, end)
slot update_slots: id 0 | task 4501 | prompt processing progress, n_tokens = 46293, batch.n_tokens = 158, progress = 0.989061
slot update_slots: id 0 | task 4501 | n_tokens = 46293, memory_seq_rm [46293, end)
slot init_sampler: id 0 | task 4501 | init sampler, took 5.65 ms, tokens: text = 46805, total = 46805
slot update_slots: id 0 | task 4501 | erasing old context checkpoint (pos_min = 39092, pos_max = 39092, n_tokens = 39093, size = 62.813 MiB)
slot update_slots: id 0 | task 4501 | created context checkpoint 8 of 8 (pos_min = 46292, pos_max = 46292, n_tokens = 46293, size = 62.813 MiB)
slot update_slots: id 0 | task 4501 | prompt processing done, n_tokens = 46805, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4501 |
prompt eval time = 1183.34 ms / 670 tokens ( 1.77 ms per token, 566.19 tokens per second)
eval time = 3352.23 ms / 111 tokens ( 30.20 ms per token, 33.11 tokens per second)
total time = 4535.57 ms / 781 tokens
slot release: id 0 | task 4501 | stop processing: n_tokens = 46915, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4614 | processing task, is_child = 0
slot update_slots: id 0 | task 4614 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46963
slot update_slots: id 0 | task 4614 | n_past = 46815, slot.prompt.tokens.size() = 46915, seq_id = 0, pos_min = 46914, n_swa = 1
slot update_slots: id 0 | task 4614 | restored context checkpoint (pos_min = 46292, pos_max = 46292, n_tokens = 46293, size = 62.813 MiB)
slot update_slots: id 0 | task 4614 | n_tokens = 46293, memory_seq_rm [46293, end)
slot update_slots: id 0 | task 4614 | prompt processing progress, n_tokens = 46451, batch.n_tokens = 158, progress = 0.989098
slot update_slots: id 0 | task 4614 | n_tokens = 46451, memory_seq_rm [46451, end)
slot init_sampler: id 0 | task 4614 | init sampler, took 6.04 ms, tokens: text = 46963, total = 46963
slot update_slots: id 0 | task 4614 | erasing old context checkpoint (pos_min = 41294, pos_max = 41294, n_tokens = 41295, size = 62.813 MiB)
slot update_slots: id 0 | task 4614 | created context checkpoint 8 of 8 (pos_min = 46450, pos_max = 46450, n_tokens = 46451, size = 62.813 MiB)
slot update_slots: id 0 | task 4614 | prompt processing done, n_tokens = 46963, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4614 |
prompt eval time = 1180.28 ms / 670 tokens ( 1.76 ms per token, 567.66 tokens per second)
eval time = 3070.20 ms / 99 tokens ( 31.01 ms per token, 32.25 tokens per second)
total time = 4250.49 ms / 769 tokens
slot release: id 0 | task 4614 | stop processing: n_tokens = 47061, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4715 | processing task, is_child = 0
slot update_slots: id 0 | task 4715 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47109
slot update_slots: id 0 | task 4715 | n_past = 46973, slot.prompt.tokens.size() = 47061, seq_id = 0, pos_min = 47060, n_swa = 1
slot update_slots: id 0 | task 4715 | restored context checkpoint (pos_min = 46450, pos_max = 46450, n_tokens = 46451, size = 62.813 MiB)
slot update_slots: id 0 | task 4715 | n_tokens = 46451, memory_seq_rm [46451, end)
slot update_slots: id 0 | task 4715 | prompt processing progress, n_tokens = 46597, batch.n_tokens = 146, progress = 0.989132
slot update_slots: id 0 | task 4715 | n_tokens = 46597, memory_seq_rm [46597, end)
slot init_sampler: id 0 | task 4715 | init sampler, took 6.01 ms, tokens: text = 47109, total = 47109
slot update_slots: id 0 | task 4715 | erasing old context checkpoint (pos_min = 42475, pos_max = 42475, n_tokens = 42476, size = 62.813 MiB)
slot update_slots: id 0 | task 4715 | created context checkpoint 8 of 8 (pos_min = 46596, pos_max = 46596, n_tokens = 46597, size = 62.813 MiB)
slot update_slots: id 0 | task 4715 | prompt processing done, n_tokens = 47109, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4715 |
prompt eval time = 1221.52 ms / 658 tokens ( 1.86 ms per token, 538.67 tokens per second)
eval time = 2014.24 ms / 65 tokens ( 30.99 ms per token, 32.27 tokens per second)
total time = 3235.76 ms / 723 tokens
slot release: id 0 | task 4715 | stop processing: n_tokens = 47173, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4782 | processing task, is_child = 0
slot update_slots: id 0 | task 4782 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47204
slot update_slots: id 0 | task 4782 | n_past = 47132, slot.prompt.tokens.size() = 47173, seq_id = 0, pos_min = 47172, n_swa = 1
slot update_slots: id 0 | task 4782 | restored context checkpoint (pos_min = 46596, pos_max = 46596, n_tokens = 46597, size = 62.813 MiB)
slot update_slots: id 0 | task 4782 | n_tokens = 46597, memory_seq_rm [46597, end)
slot update_slots: id 0 | task 4782 | prompt processing progress, n_tokens = 46692, batch.n_tokens = 95, progress = 0.989153
slot update_slots: id 0 | task 4782 | n_tokens = 46692, memory_seq_rm [46692, end)
slot init_sampler: id 0 | task 4782 | init sampler, took 5.79 ms, tokens: text = 47204, total = 47204
slot update_slots: id 0 | task 4782 | erasing old context checkpoint (pos_min = 42765, pos_max = 42765, n_tokens = 42766, size = 62.813 MiB)
slot update_slots: id 0 | task 4782 | created context checkpoint 8 of 8 (pos_min = 46691, pos_max = 46691, n_tokens = 46692, size = 62.813 MiB)
slot update_slots: id 0 | task 4782 | prompt processing done, n_tokens = 47204, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4782 |
prompt eval time = 1131.61 ms / 607 tokens ( 1.86 ms per token, 536.40 tokens per second)
eval time = 1859.83 ms / 62 tokens ( 30.00 ms per token, 33.34 tokens per second)
total time = 2991.44 ms / 669 tokens
slot release: id 0 | task 4782 | stop processing: n_tokens = 47265, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.991 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4846 | processing task, is_child = 0
slot update_slots: id 0 | task 4846 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47676
slot update_slots: id 0 | task 4846 | n_tokens = 47265, memory_seq_rm [47265, end)
slot init_sampler: id 0 | task 4846 | init sampler, took 5.80 ms, tokens: text = 47676, total = 47676
slot update_slots: id 0 | task 4846 | erasing old context checkpoint (pos_min = 43695, pos_max = 43695, n_tokens = 43696, size = 62.813 MiB)
slot update_slots: id 0 | task 4846 | created context checkpoint 8 of 8 (pos_min = 47264, pos_max = 47264, n_tokens = 47265, size = 62.813 MiB)
slot update_slots: id 0 | task 4846 | prompt processing done, n_tokens = 47676, batch.n_tokens = 411
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 4846 |
prompt eval time = 720.74 ms / 411 tokens ( 1.75 ms per token, 570.25 tokens per second)
eval time = 4857.59 ms / 160 tokens ( 30.36 ms per token, 32.94 tokens per second)
total time = 5578.33 ms / 571 tokens
slot release: id 0 | task 4846 | stop processing: n_tokens = 47835, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5007 | processing task, is_child = 0
slot update_slots: id 0 | task 5007 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47991
slot update_slots: id 0 | task 5007 | n_past = 47699, slot.prompt.tokens.size() = 47835, seq_id = 0, pos_min = 47834, n_swa = 1
slot update_slots: id 0 | task 5007 | restored context checkpoint (pos_min = 47264, pos_max = 47264, n_tokens = 47265, size = 62.813 MiB)
slot update_slots: id 0 | task 5007 | n_tokens = 47265, memory_seq_rm [47265, end)
slot update_slots: id 0 | task 5007 | prompt processing progress, n_tokens = 47479, batch.n_tokens = 214, progress = 0.989331
slot update_slots: id 0 | task 5007 | n_tokens = 47479, memory_seq_rm [47479, end)
slot init_sampler: id 0 | task 5007 | init sampler, took 5.90 ms, tokens: text = 47991, total = 47991
slot update_slots: id 0 | task 5007 | erasing old context checkpoint (pos_min = 43926, pos_max = 43926, n_tokens = 43927, size = 62.813 MiB)
slot update_slots: id 0 | task 5007 | created context checkpoint 8 of 8 (pos_min = 47478, pos_max = 47478, n_tokens = 47479, size = 62.813 MiB)
slot update_slots: id 0 | task 5007 | prompt processing done, n_tokens = 47991, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5007 |
prompt eval time = 1251.99 ms / 726 tokens ( 1.72 ms per token, 579.87 tokens per second)
eval time = 5086.56 ms / 168 tokens ( 30.28 ms per token, 33.03 tokens per second)
total time = 6338.55 ms / 894 tokens
slot release: id 0 | task 5007 | stop processing: n_tokens = 48158, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.908 (> 0.100 thold), f_keep = 0.547
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5177 | processing task, is_child = 0
slot update_slots: id 0 | task 5177 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 29006
slot update_slots: id 0 | task 5177 | n_past = 26323, slot.prompt.tokens.size() = 48158, seq_id = 0, pos_min = 48157, n_swa = 1
slot update_slots: id 0 | task 5177 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 5177 | erased invalidated context checkpoint (pos_min = 44351, pos_max = 44351, n_tokens = 44352, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | erased invalidated context checkpoint (pos_min = 46134, pos_max = 46134, n_tokens = 46135, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | erased invalidated context checkpoint (pos_min = 46292, pos_max = 46292, n_tokens = 46293, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | erased invalidated context checkpoint (pos_min = 46450, pos_max = 46450, n_tokens = 46451, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | erased invalidated context checkpoint (pos_min = 46596, pos_max = 46596, n_tokens = 46597, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | erased invalidated context checkpoint (pos_min = 46691, pos_max = 46691, n_tokens = 46692, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | erased invalidated context checkpoint (pos_min = 47264, pos_max = 47264, n_tokens = 47265, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | erased invalidated context checkpoint (pos_min = 47478, pos_max = 47478, n_tokens = 47479, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 5177 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 4096, progress = 0.141212
slot update_slots: id 0 | task 5177 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id 0 | task 5177 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 4096, progress = 0.282424
slot update_slots: id 0 | task 5177 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id 0 | task 5177 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 4096, progress = 0.423636
slot update_slots: id 0 | task 5177 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 5177 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 4096, progress = 0.564849
slot update_slots: id 0 | task 5177 | n_tokens = 16384, memory_seq_rm [16384, end)
slot update_slots: id 0 | task 5177 | prompt processing progress, n_tokens = 20480, batch.n_tokens = 4096, progress = 0.706061
slot update_slots: id 0 | task 5177 | n_tokens = 20480, memory_seq_rm [20480, end)
slot update_slots: id 0 | task 5177 | prompt processing progress, n_tokens = 24576, batch.n_tokens = 4096, progress = 0.847273
slot update_slots: id 0 | task 5177 | n_tokens = 24576, memory_seq_rm [24576, end)
slot update_slots: id 0 | task 5177 | prompt processing progress, n_tokens = 28494, batch.n_tokens = 3918, progress = 0.982349
slot update_slots: id 0 | task 5177 | n_tokens = 28494, memory_seq_rm [28494, end)
slot init_sampler: id 0 | task 5177 | init sampler, took 3.50 ms, tokens: text = 29006, total = 29006
slot update_slots: id 0 | task 5177 | created context checkpoint 1 of 8 (pos_min = 28493, pos_max = 28493, n_tokens = 28494, size = 62.813 MiB)
slot update_slots: id 0 | task 5177 | prompt processing done, n_tokens = 29006, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5177 |
prompt eval time = 19228.53 ms / 29006 tokens ( 0.66 ms per token, 1508.49 tokens per second)
eval time = 898.79 ms / 32 tokens ( 28.09 ms per token, 35.60 tokens per second)
total time = 20127.33 ms / 29038 tokens
slot release: id 0 | task 5177 | stop processing: n_tokens = 29037, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.962 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5217 | processing task, is_child = 0
slot update_slots: id 0 | task 5217 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 30151
slot update_slots: id 0 | task 5217 | n_past = 29002, slot.prompt.tokens.size() = 29037, seq_id = 0, pos_min = 29036, n_swa = 1
slot update_slots: id 0 | task 5217 | restored context checkpoint (pos_min = 28493, pos_max = 28493, n_tokens = 28494, size = 62.813 MiB)
slot update_slots: id 0 | task 5217 | n_tokens = 28494, memory_seq_rm [28494, end)
slot update_slots: id 0 | task 5217 | prompt processing progress, n_tokens = 29639, batch.n_tokens = 1145, progress = 0.983019
slot update_slots: id 0 | task 5217 | n_tokens = 29639, memory_seq_rm [29639, end)
slot init_sampler: id 0 | task 5217 | init sampler, took 3.71 ms, tokens: text = 30151, total = 30151
slot update_slots: id 0 | task 5217 | created context checkpoint 2 of 8 (pos_min = 29638, pos_max = 29638, n_tokens = 29639, size = 62.813 MiB)
slot update_slots: id 0 | task 5217 | prompt processing done, n_tokens = 30151, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5217 |
prompt eval time = 1615.85 ms / 1657 tokens ( 0.98 ms per token, 1025.46 tokens per second)
eval time = 857.71 ms / 31 tokens ( 27.67 ms per token, 36.14 tokens per second)
total time = 2473.56 ms / 1688 tokens
slot release: id 0 | task 5217 | stop processing: n_tokens = 30181, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.964 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5250 | processing task, is_child = 0
slot update_slots: id 0 | task 5250 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 31263
slot update_slots: id 0 | task 5250 | n_past = 30147, slot.prompt.tokens.size() = 30181, seq_id = 0, pos_min = 30180, n_swa = 1
slot update_slots: id 0 | task 5250 | restored context checkpoint (pos_min = 29638, pos_max = 29638, n_tokens = 29639, size = 62.813 MiB)
slot update_slots: id 0 | task 5250 | n_tokens = 29639, memory_seq_rm [29639, end)
slot update_slots: id 0 | task 5250 | prompt processing progress, n_tokens = 30751, batch.n_tokens = 1112, progress = 0.983623
slot update_slots: id 0 | task 5250 | n_tokens = 30751, memory_seq_rm [30751, end)
slot init_sampler: id 0 | task 5250 | init sampler, took 3.92 ms, tokens: text = 31263, total = 31263
slot update_slots: id 0 | task 5250 | created context checkpoint 3 of 8 (pos_min = 30750, pos_max = 30750, n_tokens = 30751, size = 62.813 MiB)
slot update_slots: id 0 | task 5250 | prompt processing done, n_tokens = 31263, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5250 |
prompt eval time = 1669.70 ms / 1624 tokens ( 1.03 ms per token, 972.63 tokens per second)
eval time = 914.45 ms / 31 tokens ( 29.50 ms per token, 33.90 tokens per second)
total time = 2584.15 ms / 1655 tokens
slot release: id 0 | task 5250 | stop processing: n_tokens = 31293, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5283 | processing task, is_child = 0
slot update_slots: id 0 | task 5283 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 31313
slot update_slots: id 0 | task 5283 | n_past = 31292, slot.prompt.tokens.size() = 31293, seq_id = 0, pos_min = 31292, n_swa = 1
slot update_slots: id 0 | task 5283 | restored context checkpoint (pos_min = 30750, pos_max = 30750, n_tokens = 30751, size = 62.813 MiB)
slot update_slots: id 0 | task 5283 | n_tokens = 30751, memory_seq_rm [30751, end)
slot update_slots: id 0 | task 5283 | prompt processing progress, n_tokens = 30801, batch.n_tokens = 50, progress = 0.983649
slot update_slots: id 0 | task 5283 | n_tokens = 30801, memory_seq_rm [30801, end)
slot init_sampler: id 0 | task 5283 | init sampler, took 3.84 ms, tokens: text = 31313, total = 31313
slot update_slots: id 0 | task 5283 | prompt processing done, n_tokens = 31313, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5283 |
prompt eval time = 925.94 ms / 562 tokens ( 1.65 ms per token, 606.95 tokens per second)
eval time = 872.50 ms / 29 tokens ( 30.09 ms per token, 33.24 tokens per second)
total time = 1798.44 ms / 591 tokens
slot release: id 0 | task 5283 | stop processing: n_tokens = 31341, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5314 | processing task, is_child = 0
slot update_slots: id 0 | task 5314 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 31398
slot update_slots: id 0 | task 5314 | n_past = 31340, slot.prompt.tokens.size() = 31341, seq_id = 0, pos_min = 31340, n_swa = 1
slot update_slots: id 0 | task 5314 | restored context checkpoint (pos_min = 30750, pos_max = 30750, n_tokens = 30751, size = 62.813 MiB)
slot update_slots: id 0 | task 5314 | n_tokens = 30751, memory_seq_rm [30751, end)
slot update_slots: id 0 | task 5314 | prompt processing progress, n_tokens = 30886, batch.n_tokens = 135, progress = 0.983693
slot update_slots: id 0 | task 5314 | n_tokens = 30886, memory_seq_rm [30886, end)
slot init_sampler: id 0 | task 5314 | init sampler, took 3.96 ms, tokens: text = 31398, total = 31398
slot update_slots: id 0 | task 5314 | created context checkpoint 4 of 8 (pos_min = 30885, pos_max = 30885, n_tokens = 30886, size = 62.813 MiB)
slot update_slots: id 0 | task 5314 | prompt processing done, n_tokens = 31398, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5314 |
prompt eval time = 1096.73 ms / 647 tokens ( 1.70 ms per token, 589.93 tokens per second)
eval time = 969.81 ms / 34 tokens ( 28.52 ms per token, 35.06 tokens per second)
total time = 2066.54 ms / 681 tokens
slot release: id 0 | task 5314 | stop processing: n_tokens = 31431, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.986 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5350 | processing task, is_child = 0
slot update_slots: id 0 | task 5350 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 31872
slot update_slots: id 0 | task 5350 | n_tokens = 31431, memory_seq_rm [31431, end)
slot init_sampler: id 0 | task 5350 | init sampler, took 3.97 ms, tokens: text = 31872, total = 31872
slot update_slots: id 0 | task 5350 | created context checkpoint 5 of 8 (pos_min = 31430, pos_max = 31430, n_tokens = 31431, size = 62.813 MiB)
slot update_slots: id 0 | task 5350 | prompt processing done, n_tokens = 31872, batch.n_tokens = 441
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5350 |
prompt eval time = 636.80 ms / 441 tokens ( 1.44 ms per token, 692.53 tokens per second)
eval time = 1028.71 ms / 35 tokens ( 29.39 ms per token, 34.02 tokens per second)
total time = 1665.51 ms / 476 tokens
slot release: id 0 | task 5350 | stop processing: n_tokens = 31906, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.948 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5386 | processing task, is_child = 0
slot update_slots: id 0 | task 5386 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 33655
slot update_slots: id 0 | task 5386 | n_tokens = 31906, memory_seq_rm [31906, end)
slot update_slots: id 0 | task 5386 | prompt processing progress, n_tokens = 33143, batch.n_tokens = 1237, progress = 0.984787
slot update_slots: id 0 | task 5386 | n_tokens = 33143, memory_seq_rm [33143, end)
slot init_sampler: id 0 | task 5386 | init sampler, took 4.10 ms, tokens: text = 33655, total = 33655
slot update_slots: id 0 | task 5386 | created context checkpoint 6 of 8 (pos_min = 33142, pos_max = 33142, n_tokens = 33143, size = 62.813 MiB)
slot update_slots: id 0 | task 5386 | prompt processing done, n_tokens = 33655, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5386 |
prompt eval time = 1773.15 ms / 1749 tokens ( 1.01 ms per token, 986.38 tokens per second)
eval time = 991.98 ms / 34 tokens ( 29.18 ms per token, 34.28 tokens per second)
total time = 2765.13 ms / 1783 tokens
slot release: id 0 | task 5386 | stop processing: n_tokens = 33688, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.958 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5422 | processing task, is_child = 0
slot update_slots: id 0 | task 5422 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 35175
slot update_slots: id 0 | task 5422 | n_tokens = 33688, memory_seq_rm [33688, end)
slot update_slots: id 0 | task 5422 | prompt processing progress, n_tokens = 34663, batch.n_tokens = 975, progress = 0.985444
slot update_slots: id 0 | task 5422 | n_tokens = 34663, memory_seq_rm [34663, end)
slot init_sampler: id 0 | task 5422 | init sampler, took 4.49 ms, tokens: text = 35175, total = 35175
slot update_slots: id 0 | task 5422 | created context checkpoint 7 of 8 (pos_min = 34662, pos_max = 34662, n_tokens = 34663, size = 62.813 MiB)
slot update_slots: id 0 | task 5422 | prompt processing done, n_tokens = 35175, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5422 |
prompt eval time = 1650.43 ms / 1487 tokens ( 1.11 ms per token, 900.98 tokens per second)
eval time = 1035.42 ms / 34 tokens ( 30.45 ms per token, 32.84 tokens per second)
total time = 2685.85 ms / 1521 tokens
slot release: id 0 | task 5422 | stop processing: n_tokens = 35208, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.948 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5458 | processing task, is_child = 0
slot update_slots: id 0 | task 5458 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 37122
slot update_slots: id 0 | task 5458 | n_tokens = 35208, memory_seq_rm [35208, end)
slot update_slots: id 0 | task 5458 | prompt processing progress, n_tokens = 36610, batch.n_tokens = 1402, progress = 0.986208
slot update_slots: id 0 | task 5458 | n_tokens = 36610, memory_seq_rm [36610, end)
slot init_sampler: id 0 | task 5458 | init sampler, took 4.84 ms, tokens: text = 37122, total = 37122
slot update_slots: id 0 | task 5458 | created context checkpoint 8 of 8 (pos_min = 36609, pos_max = 36609, n_tokens = 36610, size = 62.813 MiB)
slot update_slots: id 0 | task 5458 | prompt processing done, n_tokens = 37122, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5458 |
prompt eval time = 1863.45 ms / 1914 tokens ( 0.97 ms per token, 1027.13 tokens per second)
eval time = 6356.80 ms / 211 tokens ( 30.13 ms per token, 33.19 tokens per second)
total time = 8220.25 ms / 2125 tokens
slot release: id 0 | task 5458 | stop processing: n_tokens = 37332, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5671 | processing task, is_child = 0
slot update_slots: id 0 | task 5671 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 37522
slot update_slots: id 0 | task 5671 | n_tokens = 37332, memory_seq_rm [37332, end)
slot init_sampler: id 0 | task 5671 | init sampler, took 4.70 ms, tokens: text = 37522, total = 37522
slot update_slots: id 0 | task 5671 | erasing old context checkpoint (pos_min = 28493, pos_max = 28493, n_tokens = 28494, size = 62.813 MiB)
slot update_slots: id 0 | task 5671 | created context checkpoint 8 of 8 (pos_min = 37331, pos_max = 37331, n_tokens = 37332, size = 62.813 MiB)
slot update_slots: id 0 | task 5671 | prompt processing done, n_tokens = 37522, batch.n_tokens = 190
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5671 |
prompt eval time = 583.70 ms / 190 tokens ( 3.07 ms per token, 325.51 tokens per second)
eval time = 6196.76 ms / 204 tokens ( 30.38 ms per token, 32.92 tokens per second)
total time = 6780.47 ms / 394 tokens
slot release: id 0 | task 5671 | stop processing: n_tokens = 37725, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 5876 | processing task, is_child = 0
slot update_slots: id 0 | task 5876 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 37816
slot update_slots: id 0 | task 5876 | n_tokens = 37725, memory_seq_rm [37725, end)
slot init_sampler: id 0 | task 5876 | init sampler, took 4.73 ms, tokens: text = 37816, total = 37816
slot update_slots: id 0 | task 5876 | erasing old context checkpoint (pos_min = 29638, pos_max = 29638, n_tokens = 29639, size = 62.813 MiB)
slot update_slots: id 0 | task 5876 | created context checkpoint 8 of 8 (pos_min = 37724, pos_max = 37724, n_tokens = 37725, size = 62.813 MiB)
slot update_slots: id 0 | task 5876 | prompt processing done, n_tokens = 37816, batch.n_tokens = 91
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 5876 |
prompt eval time = 418.51 ms / 91 tokens ( 4.60 ms per token, 217.44 tokens per second)
eval time = 5221.88 ms / 171 tokens ( 30.54 ms per token, 32.75 tokens per second)
total time = 5640.39 ms / 262 tokens
slot release: id 0 | task 5876 | stop processing: n_tokens = 37986, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 6048 | processing task, is_child = 0
slot update_slots: id 0 | task 6048 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38062
slot update_slots: id 0 | task 6048 | n_tokens = 37986, memory_seq_rm [37986, end)
slot init_sampler: id 0 | task 6048 | init sampler, took 4.86 ms, tokens: text = 38062, total = 38062
slot update_slots: id 0 | task 6048 | erasing old context checkpoint (pos_min = 30750, pos_max = 30750, n_tokens = 30751, size = 62.813 MiB)
slot update_slots: id 0 | task 6048 | created context checkpoint 8 of 8 (pos_min = 37985, pos_max = 37985, n_tokens = 37986, size = 62.813 MiB)
slot update_slots: id 0 | task 6048 | prompt processing done, n_tokens = 38062, batch.n_tokens = 76
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 6048 |
prompt eval time = 412.61 ms / 76 tokens ( 5.43 ms per token, 184.19 tokens per second)
eval time = 12535.38 ms / 412 tokens ( 30.43 ms per token, 32.87 tokens per second)
total time = 12947.99 ms / 488 tokens
slot release: id 0 | task 6048 | stop processing: n_tokens = 38473, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.813 (> 0.100 thold), f_keep = 0.812
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 6461 | processing task, is_child = 0
slot update_slots: id 0 | task 6461 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38450
slot update_slots: id 0 | task 6461 | n_past = 31259, slot.prompt.tokens.size() = 38473, seq_id = 0, pos_min = 38472, n_swa = 1
slot update_slots: id 0 | task 6461 | restored context checkpoint (pos_min = 30885, pos_max = 30885, n_tokens = 30886, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | erased invalidated context checkpoint (pos_min = 31430, pos_max = 31430, n_tokens = 31431, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | erased invalidated context checkpoint (pos_min = 33142, pos_max = 33142, n_tokens = 33143, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | erased invalidated context checkpoint (pos_min = 34662, pos_max = 34662, n_tokens = 34663, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | erased invalidated context checkpoint (pos_min = 36609, pos_max = 36609, n_tokens = 36610, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | erased invalidated context checkpoint (pos_min = 37331, pos_max = 37331, n_tokens = 37332, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | erased invalidated context checkpoint (pos_min = 37724, pos_max = 37724, n_tokens = 37725, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | erased invalidated context checkpoint (pos_min = 37985, pos_max = 37985, n_tokens = 37986, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | n_tokens = 30886, memory_seq_rm [30886, end)
slot update_slots: id 0 | task 6461 | prompt processing progress, n_tokens = 34982, batch.n_tokens = 4096, progress = 0.909805
slot update_slots: id 0 | task 6461 | n_tokens = 34982, memory_seq_rm [34982, end)
slot update_slots: id 0 | task 6461 | prompt processing progress, n_tokens = 37938, batch.n_tokens = 2956, progress = 0.986684
slot update_slots: id 0 | task 6461 | n_tokens = 37938, memory_seq_rm [37938, end)
slot init_sampler: id 0 | task 6461 | init sampler, took 5.53 ms, tokens: text = 38450, total = 38450
slot update_slots: id 0 | task 6461 | created context checkpoint 2 of 8 (pos_min = 37937, pos_max = 37937, n_tokens = 37938, size = 62.813 MiB)
slot update_slots: id 0 | task 6461 | prompt processing done, n_tokens = 38450, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 6461 |
prompt eval time = 6428.73 ms / 7564 tokens ( 0.85 ms per token, 1176.59 tokens per second)
eval time = 2342.51 ms / 79 tokens ( 29.65 ms per token, 33.72 tokens per second)
total time = 8771.24 ms / 7643 tokens
slot release: id 0 | task 6461 | stop processing: n_tokens = 38528, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 6543 | processing task, is_child = 0
slot update_slots: id 0 | task 6543 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38559
slot update_slots: id 0 | task 6543 | n_tokens = 38528, memory_seq_rm [38528, end)
slot init_sampler: id 0 | task 6543 | init sampler, took 4.67 ms, tokens: text = 38559, total = 38559
slot update_slots: id 0 | task 6543 | created context checkpoint 3 of 8 (pos_min = 38527, pos_max = 38527, n_tokens = 38528, size = 62.813 MiB)
slot update_slots: id 0 | task 6543 | prompt processing done, n_tokens = 38559, batch.n_tokens = 31
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 6543 |
prompt eval time = 257.27 ms / 31 tokens ( 8.30 ms per token, 120.50 tokens per second)
eval time = 1591.97 ms / 54 tokens ( 29.48 ms per token, 33.92 tokens per second)
total time = 1849.24 ms / 85 tokens
slot release: id 0 | task 6543 | stop processing: n_tokens = 38612, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 6598 | processing task, is_child = 0
slot update_slots: id 0 | task 6598 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 38629
slot update_slots: id 0 | task 6598 | n_tokens = 38612, memory_seq_rm [38612, end)
slot init_sampler: id 0 | task 6598 | init sampler, took 4.68 ms, tokens: text = 38629, total = 38629
slot update_slots: id 0 | task 6598 | created context checkpoint 4 of 8 (pos_min = 38611, pos_max = 38611, n_tokens = 38612, size = 62.813 MiB)
slot update_slots: id 0 | task 6598 | prompt processing done, n_tokens = 38629, batch.n_tokens = 17
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 6598 |
prompt eval time = 239.91 ms / 17 tokens ( 14.11 ms per token, 70.86 tokens per second)
eval time = 19755.90 ms / 667 tokens ( 29.62 ms per token, 33.76 tokens per second)
total time = 19995.81 ms / 684 tokens
slot release: id 0 | task 6598 | stop processing: n_tokens = 39295, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 7266 | processing task, is_child = 0
slot update_slots: id 0 | task 7266 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 39344
slot update_slots: id 0 | task 7266 | n_past = 39285, slot.prompt.tokens.size() = 39295, seq_id = 0, pos_min = 39294, n_swa = 1
slot update_slots: id 0 | task 7266 | restored context checkpoint (pos_min = 38611, pos_max = 38611, n_tokens = 38612, size = 62.813 MiB)
slot update_slots: id 0 | task 7266 | n_tokens = 38612, memory_seq_rm [38612, end)
slot update_slots: id 0 | task 7266 | prompt processing progress, n_tokens = 38832, batch.n_tokens = 220, progress = 0.986987
slot update_slots: id 0 | task 7266 | n_tokens = 38832, memory_seq_rm [38832, end)
slot init_sampler: id 0 | task 7266 | init sampler, took 4.92 ms, tokens: text = 39344, total = 39344
slot update_slots: id 0 | task 7266 | created context checkpoint 5 of 8 (pos_min = 38831, pos_max = 38831, n_tokens = 38832, size = 62.813 MiB)
slot update_slots: id 0 | task 7266 | prompt processing done, n_tokens = 39344, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 7266 |
prompt eval time = 1210.34 ms / 732 tokens ( 1.65 ms per token, 604.79 tokens per second)
eval time = 2623.48 ms / 85 tokens ( 30.86 ms per token, 32.40 tokens per second)
total time = 3833.82 ms / 817 tokens
slot release: id 0 | task 7266 | stop processing: n_tokens = 39428, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 7353 | processing task, is_child = 0
slot update_slots: id 0 | task 7353 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 39495
slot update_slots: id 0 | task 7353 | n_past = 39358, slot.prompt.tokens.size() = 39428, seq_id = 0, pos_min = 39427, n_swa = 1
slot update_slots: id 0 | task 7353 | restored context checkpoint (pos_min = 38831, pos_max = 38831, n_tokens = 38832, size = 62.813 MiB)
slot update_slots: id 0 | task 7353 | n_tokens = 38832, memory_seq_rm [38832, end)
slot update_slots: id 0 | task 7353 | prompt processing progress, n_tokens = 38983, batch.n_tokens = 151, progress = 0.987036
slot update_slots: id 0 | task 7353 | n_tokens = 38983, memory_seq_rm [38983, end)
slot init_sampler: id 0 | task 7353 | init sampler, took 4.80 ms, tokens: text = 39495, total = 39495
slot update_slots: id 0 | task 7353 | created context checkpoint 6 of 8 (pos_min = 38982, pos_max = 38982, n_tokens = 38983, size = 62.813 MiB)
slot update_slots: id 0 | task 7353 | prompt processing done, n_tokens = 39495, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 7353 |
prompt eval time = 1148.45 ms / 663 tokens ( 1.73 ms per token, 577.30 tokens per second)
eval time = 1128.64 ms / 38 tokens ( 29.70 ms per token, 33.67 tokens per second)
total time = 2277.08 ms / 701 tokens
slot release: id 0 | task 7353 | stop processing: n_tokens = 39532, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 7393 | processing task, is_child = 0
slot update_slots: id 0 | task 7393 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 39555
slot update_slots: id 0 | task 7393 | n_tokens = 39532, memory_seq_rm [39532, end)
slot init_sampler: id 0 | task 7393 | init sampler, took 4.81 ms, tokens: text = 39555, total = 39555
slot update_slots: id 0 | task 7393 | created context checkpoint 7 of 8 (pos_min = 39531, pos_max = 39531, n_tokens = 39532, size = 62.813 MiB)
slot update_slots: id 0 | task 7393 | prompt processing done, n_tokens = 39555, batch.n_tokens = 23
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 7393 |
prompt eval time = 212.09 ms / 23 tokens ( 9.22 ms per token, 108.45 tokens per second)
eval time = 920.92 ms / 31 tokens ( 29.71 ms per token, 33.66 tokens per second)
total time = 1133.01 ms / 54 tokens
slot release: id 0 | task 7393 | stop processing: n_tokens = 39585, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.951 (> 0.100 thold), f_keep = 0.971
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 7425 | processing task, is_child = 0
slot update_slots: id 0 | task 7425 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 40431
slot update_slots: id 0 | task 7425 | n_past = 38446, slot.prompt.tokens.size() = 39585, seq_id = 0, pos_min = 39584, n_swa = 1
slot update_slots: id 0 | task 7425 | restored context checkpoint (pos_min = 37937, pos_max = 37937, n_tokens = 37938, size = 62.813 MiB)
slot update_slots: id 0 | task 7425 | erased invalidated context checkpoint (pos_min = 38527, pos_max = 38527, n_tokens = 38528, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 7425 | erased invalidated context checkpoint (pos_min = 38611, pos_max = 38611, n_tokens = 38612, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 7425 | erased invalidated context checkpoint (pos_min = 38831, pos_max = 38831, n_tokens = 38832, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 7425 | erased invalidated context checkpoint (pos_min = 38982, pos_max = 38982, n_tokens = 38983, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 7425 | erased invalidated context checkpoint (pos_min = 39531, pos_max = 39531, n_tokens = 39532, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 7425 | n_tokens = 37938, memory_seq_rm [37938, end)
slot update_slots: id 0 | task 7425 | prompt processing progress, n_tokens = 39919, batch.n_tokens = 1981, progress = 0.987336
slot update_slots: id 0 | task 7425 | n_tokens = 39919, memory_seq_rm [39919, end)
slot init_sampler: id 0 | task 7425 | init sampler, took 4.93 ms, tokens: text = 40431, total = 40431
slot update_slots: id 0 | task 7425 | created context checkpoint 3 of 8 (pos_min = 39918, pos_max = 39918, n_tokens = 39919, size = 62.813 MiB)
slot update_slots: id 0 | task 7425 | prompt processing done, n_tokens = 40431, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 7425 |
prompt eval time = 2276.96 ms / 2493 tokens ( 0.91 ms per token, 1094.88 tokens per second)
eval time = 59593.90 ms / 1919 tokens ( 31.05 ms per token, 32.20 tokens per second)
total time = 61870.86 ms / 4412 tokens
slot release: id 0 | task 7425 | stop processing: n_tokens = 42349, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 9346 | processing task, is_child = 0
slot update_slots: id 0 | task 9346 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42399
slot update_slots: id 0 | task 9346 | n_past = 42339, slot.prompt.tokens.size() = 42349, seq_id = 0, pos_min = 42348, n_swa = 1
slot update_slots: id 0 | task 9346 | restored context checkpoint (pos_min = 39918, pos_max = 39918, n_tokens = 39919, size = 62.813 MiB)
slot update_slots: id 0 | task 9346 | n_tokens = 39919, memory_seq_rm [39919, end)
slot update_slots: id 0 | task 9346 | prompt processing progress, n_tokens = 41887, batch.n_tokens = 1968, progress = 0.987924
slot update_slots: id 0 | task 9346 | n_tokens = 41887, memory_seq_rm [41887, end)
slot init_sampler: id 0 | task 9346 | init sampler, took 5.48 ms, tokens: text = 42399, total = 42399
slot update_slots: id 0 | task 9346 | created context checkpoint 4 of 8 (pos_min = 41886, pos_max = 41886, n_tokens = 41887, size = 62.813 MiB)
slot update_slots: id 0 | task 9346 | prompt processing done, n_tokens = 42399, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9346 |
prompt eval time = 2339.85 ms / 2480 tokens ( 0.94 ms per token, 1059.90 tokens per second)
eval time = 2393.71 ms / 77 tokens ( 31.09 ms per token, 32.17 tokens per second)
total time = 4733.56 ms / 2557 tokens
slot release: id 0 | task 9346 | stop processing: n_tokens = 42475, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 9425 | processing task, is_child = 0
slot update_slots: id 0 | task 9425 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42553
slot update_slots: id 0 | task 9425 | n_tokens = 42475, memory_seq_rm [42475, end)
slot init_sampler: id 0 | task 9425 | init sampler, took 5.25 ms, tokens: text = 42553, total = 42553
slot update_slots: id 0 | task 9425 | created context checkpoint 5 of 8 (pos_min = 42474, pos_max = 42474, n_tokens = 42475, size = 62.813 MiB)
slot update_slots: id 0 | task 9425 | prompt processing done, n_tokens = 42553, batch.n_tokens = 78
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9425 |
prompt eval time = 391.98 ms / 78 tokens ( 5.03 ms per token, 198.99 tokens per second)
eval time = 2859.19 ms / 88 tokens ( 32.49 ms per token, 30.78 tokens per second)
total time = 3251.17 ms / 166 tokens
slot release: id 0 | task 9425 | stop processing: n_tokens = 42640, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.948 (> 0.100 thold), f_keep = 0.948
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 9514 | processing task, is_child = 0
slot update_slots: id 0 | task 9514 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42643
slot update_slots: id 0 | task 9514 | n_past = 40427, slot.prompt.tokens.size() = 42640, seq_id = 0, pos_min = 42639, n_swa = 1
slot update_slots: id 0 | task 9514 | restored context checkpoint (pos_min = 39918, pos_max = 39918, n_tokens = 39919, size = 62.813 MiB)
slot update_slots: id 0 | task 9514 | erased invalidated context checkpoint (pos_min = 41886, pos_max = 41886, n_tokens = 41887, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9514 | erased invalidated context checkpoint (pos_min = 42474, pos_max = 42474, n_tokens = 42475, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9514 | n_tokens = 39919, memory_seq_rm [39919, end)
slot update_slots: id 0 | task 9514 | prompt processing progress, n_tokens = 42131, batch.n_tokens = 2212, progress = 0.987993
slot update_slots: id 0 | task 9514 | n_tokens = 42131, memory_seq_rm [42131, end)
slot init_sampler: id 0 | task 9514 | init sampler, took 5.67 ms, tokens: text = 42643, total = 42643
slot update_slots: id 0 | task 9514 | created context checkpoint 4 of 8 (pos_min = 42130, pos_max = 42130, n_tokens = 42131, size = 62.813 MiB)
slot update_slots: id 0 | task 9514 | prompt processing done, n_tokens = 42643, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9514 |
prompt eval time = 2773.88 ms / 2724 tokens ( 1.02 ms per token, 982.02 tokens per second)
eval time = 2513.32 ms / 84 tokens ( 29.92 ms per token, 33.42 tokens per second)
total time = 5287.20 ms / 2808 tokens
slot release: id 0 | task 9514 | stop processing: n_tokens = 42726, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 9600 | processing task, is_child = 0
slot update_slots: id 0 | task 9600 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 42757
slot update_slots: id 0 | task 9600 | n_tokens = 42726, memory_seq_rm [42726, end)
slot init_sampler: id 0 | task 9600 | init sampler, took 5.33 ms, tokens: text = 42757, total = 42757
slot update_slots: id 0 | task 9600 | created context checkpoint 5 of 8 (pos_min = 42725, pos_max = 42725, n_tokens = 42726, size = 62.813 MiB)
slot update_slots: id 0 | task 9600 | prompt processing done, n_tokens = 42757, batch.n_tokens = 31
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9600 |
prompt eval time = 252.44 ms / 31 tokens ( 8.14 ms per token, 122.80 tokens per second)
eval time = 1011.71 ms / 32 tokens ( 31.62 ms per token, 31.63 tokens per second)
total time = 1264.15 ms / 63 tokens
slot release: id 0 | task 9600 | stop processing: n_tokens = 42788, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.943 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 9633 | processing task, is_child = 0
slot update_slots: id 0 | task 9633 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 45211
slot update_slots: id 0 | task 9633 | n_past = 42639, slot.prompt.tokens.size() = 42788, seq_id = 0, pos_min = 42787, n_swa = 1
slot update_slots: id 0 | task 9633 | restored context checkpoint (pos_min = 42130, pos_max = 42130, n_tokens = 42131, size = 62.813 MiB)
slot update_slots: id 0 | task 9633 | erased invalidated context checkpoint (pos_min = 42725, pos_max = 42725, n_tokens = 42726, n_swa = 1, size = 62.813 MiB)
slot update_slots: id 0 | task 9633 | n_tokens = 42131, memory_seq_rm [42131, end)
slot update_slots: id 0 | task 9633 | prompt processing progress, n_tokens = 44699, batch.n_tokens = 2568, progress = 0.988675
slot update_slots: id 0 | task 9633 | n_tokens = 44699, memory_seq_rm [44699, end)
slot init_sampler: id 0 | task 9633 | init sampler, took 5.50 ms, tokens: text = 45211, total = 45211
slot update_slots: id 0 | task 9633 | created context checkpoint 5 of 8 (pos_min = 44698, pos_max = 44698, n_tokens = 44699, size = 62.813 MiB)
slot update_slots: id 0 | task 9633 | prompt processing done, n_tokens = 45211, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9633 |
prompt eval time = 3182.90 ms / 3080 tokens ( 1.03 ms per token, 967.67 tokens per second)
eval time = 8843.18 ms / 270 tokens ( 32.75 ms per token, 30.53 tokens per second)
total time = 12026.08 ms / 3350 tokens
slot release: id 0 | task 9633 | stop processing: n_tokens = 45480, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.994
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 9905 | processing task, is_child = 0
slot update_slots: id 0 | task 9905 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 45491
slot update_slots: id 0 | task 9905 | n_past = 45207, slot.prompt.tokens.size() = 45480, seq_id = 0, pos_min = 45479, n_swa = 1
slot update_slots: id 0 | task 9905 | restored context checkpoint (pos_min = 44698, pos_max = 44698, n_tokens = 44699, size = 62.813 MiB)
slot update_slots: id 0 | task 9905 | n_tokens = 44699, memory_seq_rm [44699, end)
slot update_slots: id 0 | task 9905 | prompt processing progress, n_tokens = 44979, batch.n_tokens = 280, progress = 0.988745
slot update_slots: id 0 | task 9905 | n_tokens = 44979, memory_seq_rm [44979, end)
slot init_sampler: id 0 | task 9905 | init sampler, took 5.69 ms, tokens: text = 45491, total = 45491
slot update_slots: id 0 | task 9905 | created context checkpoint 6 of 8 (pos_min = 44978, pos_max = 44978, n_tokens = 44979, size = 62.813 MiB)
slot update_slots: id 0 | task 9905 | prompt processing done, n_tokens = 45491, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9905 |
prompt eval time = 1296.90 ms / 792 tokens ( 1.64 ms per token, 610.69 tokens per second)
eval time = 807.06 ms / 27 tokens ( 29.89 ms per token, 33.45 tokens per second)
total time = 2103.96 ms / 819 tokens
slot release: id 0 | task 9905 | stop processing: n_tokens = 45517, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 9934 | processing task, is_child = 0
slot update_slots: id 0 | task 9934 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 45574
slot update_slots: id 0 | task 9934 | n_tokens = 45517, memory_seq_rm [45517, end)
slot init_sampler: id 0 | task 9934 | init sampler, took 5.46 ms, tokens: text = 45574, total = 45574
slot update_slots: id 0 | task 9934 | created context checkpoint 7 of 8 (pos_min = 45516, pos_max = 45516, n_tokens = 45517, size = 62.813 MiB)
slot update_slots: id 0 | task 9934 | prompt processing done, n_tokens = 45574, batch.n_tokens = 57
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 9934 |
prompt eval time = 371.27 ms / 57 tokens ( 6.51 ms per token, 153.53 tokens per second)
eval time = 6152.88 ms / 200 tokens ( 30.76 ms per token, 32.51 tokens per second)
total time = 6524.15 ms / 257 tokens
slot release: id 0 | task 9934 | stop processing: n_tokens = 45773, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 0.996
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 10135 | processing task, is_child = 0
slot update_slots: id 0 | task 10135 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 45821
slot update_slots: id 0 | task 10135 | n_past = 45584, slot.prompt.tokens.size() = 45773, seq_id = 0, pos_min = 45772, n_swa = 1
slot update_slots: id 0 | task 10135 | restored context checkpoint (pos_min = 45516, pos_max = 45516, n_tokens = 45517, size = 62.813 MiB)
slot update_slots: id 0 | task 10135 | n_tokens = 45517, memory_seq_rm [45517, end)
slot init_sampler: id 0 | task 10135 | init sampler, took 5.66 ms, tokens: text = 45821, total = 45821
slot update_slots: id 0 | task 10135 | prompt processing done, n_tokens = 45821, batch.n_tokens = 304
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10135 |
prompt eval time = 581.29 ms / 304 tokens ( 1.91 ms per token, 522.97 tokens per second)
eval time = 5519.43 ms / 176 tokens ( 31.36 ms per token, 31.89 tokens per second)
total time = 6100.72 ms / 480 tokens
slot release: id 0 | task 10135 | stop processing: n_tokens = 45996, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.993 (> 0.100 thold), f_keep = 0.996
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 10312 | processing task, is_child = 0
slot update_slots: id 0 | task 10312 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46166
slot update_slots: id 0 | task 10312 | n_past = 45831, slot.prompt.tokens.size() = 45996, seq_id = 0, pos_min = 45995, n_swa = 1
slot update_slots: id 0 | task 10312 | restored context checkpoint (pos_min = 45516, pos_max = 45516, n_tokens = 45517, size = 62.813 MiB)
slot update_slots: id 0 | task 10312 | n_tokens = 45517, memory_seq_rm [45517, end)
slot update_slots: id 0 | task 10312 | prompt processing progress, n_tokens = 45654, batch.n_tokens = 137, progress = 0.988910
slot update_slots: id 0 | task 10312 | n_tokens = 45654, memory_seq_rm [45654, end)
slot init_sampler: id 0 | task 10312 | init sampler, took 7.23 ms, tokens: text = 46166, total = 46166
slot update_slots: id 0 | task 10312 | created context checkpoint 8 of 8 (pos_min = 45653, pos_max = 45653, n_tokens = 45654, size = 62.813 MiB)
slot update_slots: id 0 | task 10312 | prompt processing done, n_tokens = 46166, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10312 |
prompt eval time = 1171.62 ms / 649 tokens ( 1.81 ms per token, 553.94 tokens per second)
eval time = 2241.14 ms / 72 tokens ( 31.13 ms per token, 32.13 tokens per second)
total time = 3412.75 ms / 721 tokens
slot release: id 0 | task 10312 | stop processing: n_tokens = 46237, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.986 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 10386 | processing task, is_child = 0
slot update_slots: id 0 | task 10386 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 46883
slot update_slots: id 0 | task 10386 | n_tokens = 46237, memory_seq_rm [46237, end)
slot update_slots: id 0 | task 10386 | prompt processing progress, n_tokens = 46371, batch.n_tokens = 134, progress = 0.989079
slot update_slots: id 0 | task 10386 | n_tokens = 46371, memory_seq_rm [46371, end)
slot init_sampler: id 0 | task 10386 | init sampler, took 5.71 ms, tokens: text = 46883, total = 46883
slot update_slots: id 0 | task 10386 | erasing old context checkpoint (pos_min = 30885, pos_max = 30885, n_tokens = 30886, size = 62.813 MiB)
slot update_slots: id 0 | task 10386 | created context checkpoint 8 of 8 (pos_min = 46370, pos_max = 46370, n_tokens = 46371, size = 62.813 MiB)
slot update_slots: id 0 | task 10386 | prompt processing done, n_tokens = 46883, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10386 |
prompt eval time = 1155.07 ms / 646 tokens ( 1.79 ms per token, 559.27 tokens per second)
eval time = 2959.94 ms / 97 tokens ( 30.51 ms per token, 32.77 tokens per second)
total time = 4115.02 ms / 743 tokens
slot release: id 0 | task 10386 | stop processing: n_tokens = 46979, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 10485 | processing task, is_child = 0
slot update_slots: id 0 | task 10485 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47327
slot update_slots: id 0 | task 10485 | n_past = 46932, slot.prompt.tokens.size() = 46979, seq_id = 0, pos_min = 46978, n_swa = 1
slot update_slots: id 0 | task 10485 | restored context checkpoint (pos_min = 46370, pos_max = 46370, n_tokens = 46371, size = 62.813 MiB)
slot update_slots: id 0 | task 10485 | n_tokens = 46371, memory_seq_rm [46371, end)
slot update_slots: id 0 | task 10485 | prompt processing progress, n_tokens = 46815, batch.n_tokens = 444, progress = 0.989182
slot update_slots: id 0 | task 10485 | n_tokens = 46815, memory_seq_rm [46815, end)
slot init_sampler: id 0 | task 10485 | init sampler, took 5.84 ms, tokens: text = 47327, total = 47327
slot update_slots: id 0 | task 10485 | erasing old context checkpoint (pos_min = 37937, pos_max = 37937, n_tokens = 37938, size = 62.813 MiB)
slot update_slots: id 0 | task 10485 | created context checkpoint 8 of 8 (pos_min = 46814, pos_max = 46814, n_tokens = 46815, size = 62.813 MiB)
slot update_slots: id 0 | task 10485 | prompt processing done, n_tokens = 47327, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10485 |
prompt eval time = 1384.30 ms / 956 tokens ( 1.45 ms per token, 690.60 tokens per second)
eval time = 1516.38 ms / 48 tokens ( 31.59 ms per token, 31.65 tokens per second)
total time = 2900.68 ms / 1004 tokens
slot release: id 0 | task 10485 | stop processing: n_tokens = 47374, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 10535 | processing task, is_child = 0
slot update_slots: id 0 | task 10535 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47397
slot update_slots: id 0 | task 10535 | n_tokens = 47374, memory_seq_rm [47374, end)
slot init_sampler: id 0 | task 10535 | init sampler, took 5.84 ms, tokens: text = 47397, total = 47397
slot update_slots: id 0 | task 10535 | erasing old context checkpoint (pos_min = 39918, pos_max = 39918, n_tokens = 39919, size = 62.813 MiB)
slot update_slots: id 0 | task 10535 | created context checkpoint 8 of 8 (pos_min = 47373, pos_max = 47373, n_tokens = 47374, size = 62.813 MiB)
slot update_slots: id 0 | task 10535 | prompt processing done, n_tokens = 47397, batch.n_tokens = 23
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10535 |
prompt eval time = 228.21 ms / 23 tokens ( 9.92 ms per token, 100.79 tokens per second)
eval time = 2659.06 ms / 88 tokens ( 30.22 ms per token, 33.09 tokens per second)
total time = 2887.27 ms / 111 tokens
slot release: id 0 | task 10535 | stop processing: n_tokens = 47484, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 10624 | processing task, is_child = 0
slot update_slots: id 0 | task 10624 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47516
slot update_slots: id 0 | task 10624 | n_tokens = 47484, memory_seq_rm [47484, end)
slot init_sampler: id 0 | task 10624 | init sampler, took 6.04 ms, tokens: text = 47516, total = 47516
slot update_slots: id 0 | task 10624 | erasing old context checkpoint (pos_min = 42130, pos_max = 42130, n_tokens = 42131, size = 62.813 MiB)
slot update_slots: id 0 | task 10624 | created context checkpoint 8 of 8 (pos_min = 47483, pos_max = 47483, n_tokens = 47484, size = 62.813 MiB)
slot update_slots: id 0 | task 10624 | prompt processing done, n_tokens = 47516, batch.n_tokens = 32
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10624 |
prompt eval time = 292.99 ms / 32 tokens ( 9.16 ms per token, 109.22 tokens per second)
eval time = 2433.20 ms / 79 tokens ( 30.80 ms per token, 32.47 tokens per second)
total time = 2726.20 ms / 111 tokens
slot release: id 0 | task 10624 | stop processing: n_tokens = 47594, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 0.999
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 10704 | processing task, is_child = 0
slot update_slots: id 0 | task 10704 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 47643
slot update_slots: id 0 | task 10704 | n_past = 47526, slot.prompt.tokens.size() = 47594, seq_id = 0, pos_min = 47593, n_swa = 1
slot update_slots: id 0 | task 10704 | restored context checkpoint (pos_min = 47483, pos_max = 47483, n_tokens = 47484, size = 62.813 MiB)
slot update_slots: id 0 | task 10704 | n_tokens = 47484, memory_seq_rm [47484, end)
slot init_sampler: id 0 | task 10704 | init sampler, took 6.00 ms, tokens: text = 47643, total = 47643
slot update_slots: id 0 | task 10704 | prompt processing done, n_tokens = 47643, batch.n_tokens = 159
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10704 |
prompt eval time = 438.83 ms / 159 tokens ( 2.76 ms per token, 362.33 tokens per second)
eval time = 8843.87 ms / 266 tokens ( 33.25 ms per token, 30.08 tokens per second)
total time = 9282.69 ms / 425 tokens
slot release: id 0 | task 10704 | stop processing: n_tokens = 47908, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.991 (> 0.100 thold), f_keep = 0.995
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 10971 | processing task, is_child = 0
slot update_slots: id 0 | task 10971 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 48099
slot update_slots: id 0 | task 10971 | n_past = 47670, slot.prompt.tokens.size() = 47908, seq_id = 0, pos_min = 47907, n_swa = 1
slot update_slots: id 0 | task 10971 | restored context checkpoint (pos_min = 47483, pos_max = 47483, n_tokens = 47484, size = 62.813 MiB)
slot update_slots: id 0 | task 10971 | n_tokens = 47484, memory_seq_rm [47484, end)
slot update_slots: id 0 | task 10971 | prompt processing progress, n_tokens = 47587, batch.n_tokens = 103, progress = 0.989355
slot update_slots: id 0 | task 10971 | n_tokens = 47587, memory_seq_rm [47587, end)
slot init_sampler: id 0 | task 10971 | init sampler, took 5.86 ms, tokens: text = 48099, total = 48099
slot update_slots: id 0 | task 10971 | erasing old context checkpoint (pos_min = 44698, pos_max = 44698, n_tokens = 44699, size = 62.813 MiB)
slot update_slots: id 0 | task 10971 | created context checkpoint 8 of 8 (pos_min = 47586, pos_max = 47586, n_tokens = 47587, size = 62.813 MiB)
slot update_slots: id 0 | task 10971 | prompt processing done, n_tokens = 48099, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 10971 |
prompt eval time = 1245.32 ms / 615 tokens ( 2.02 ms per token, 493.85 tokens per second)
eval time = 2158.36 ms / 66 tokens ( 32.70 ms per token, 30.58 tokens per second)
total time = 3403.68 ms / 681 tokens
slot release: id 0 | task 10971 | stop processing: n_tokens = 48164, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 11039 | processing task, is_child = 0
slot update_slots: id 0 | task 11039 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 48332
slot update_slots: id 0 | task 11039 | n_tokens = 48164, memory_seq_rm [48164, end)
slot init_sampler: id 0 | task 11039 | init sampler, took 5.99 ms, tokens: text = 48332, total = 48332
slot update_slots: id 0 | task 11039 | erasing old context checkpoint (pos_min = 44978, pos_max = 44978, n_tokens = 44979, size = 62.813 MiB)
slot update_slots: id 0 | task 11039 | created context checkpoint 8 of 8 (pos_min = 48163, pos_max = 48163, n_tokens = 48164, size = 62.813 MiB)
slot update_slots: id 0 | task 11039 | prompt processing done, n_tokens = 48332, batch.n_tokens = 168
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 11039 |
prompt eval time = 520.48 ms / 168 tokens ( 3.10 ms per token, 322.78 tokens per second)
eval time = 2680.36 ms / 80 tokens ( 33.50 ms per token, 29.85 tokens per second)
total time = 3200.84 ms / 248 tokens
slot release: id 0 | task 11039 | stop processing: n_tokens = 48411, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 11120 | processing task, is_child = 0
slot update_slots: id 0 | task 11120 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 48444
slot update_slots: id 0 | task 11120 | n_tokens = 48411, memory_seq_rm [48411, end)
slot init_sampler: id 0 | task 11120 | init sampler, took 6.26 ms, tokens: text = 48444, total = 48444
slot update_slots: id 0 | task 11120 | erasing old context checkpoint (pos_min = 45516, pos_max = 45516, n_tokens = 45517, size = 62.813 MiB)
slot update_slots: id 0 | task 11120 | created context checkpoint 8 of 8 (pos_min = 48410, pos_max = 48410, n_tokens = 48411, size = 62.813 MiB)
slot update_slots: id 0 | task 11120 | prompt processing done, n_tokens = 48444, batch.n_tokens = 33
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 11120 |
prompt eval time = 326.44 ms / 33 tokens ( 9.89 ms per token, 101.09 tokens per second)
eval time = 11539.79 ms / 348 tokens ( 33.16 ms per token, 30.16 tokens per second)
total time = 11866.23 ms / 381 tokens
slot release: id 0 | task 11120 | stop processing: n_tokens = 48791, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 0.993
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 11469 | processing task, is_child = 0
slot update_slots: id 0 | task 11469 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 48825
slot update_slots: id 0 | task 11469 | n_past = 48455, slot.prompt.tokens.size() = 48791, seq_id = 0, pos_min = 48790, n_swa = 1
slot update_slots: id 0 | task 11469 | restored context checkpoint (pos_min = 48410, pos_max = 48410, n_tokens = 48411, size = 62.813 MiB)
slot update_slots: id 0 | task 11469 | n_tokens = 48411, memory_seq_rm [48411, end)
slot init_sampler: id 0 | task 11469 | init sampler, took 5.88 ms, tokens: text = 48825, total = 48825
slot update_slots: id 0 | task 11469 | prompt processing done, n_tokens = 48825, batch.n_tokens = 414
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 11469 |
prompt eval time = 628.66 ms / 414 tokens ( 1.52 ms per token, 658.54 tokens per second)
eval time = 5410.60 ms / 174 tokens ( 31.10 ms per token, 32.16 tokens per second)
total time = 6039.26 ms / 588 tokens
slot release: id 0 | task 11469 | stop processing: n_tokens = 48998, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.989 (> 0.100 thold), f_keep = 0.997
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 11644 | processing task, is_child = 0
slot update_slots: id 0 | task 11644 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 49402
slot update_slots: id 0 | task 11644 | n_past = 48847, slot.prompt.tokens.size() = 48998, seq_id = 0, pos_min = 48997, n_swa = 1
slot update_slots: id 0 | task 11644 | restored context checkpoint (pos_min = 48410, pos_max = 48410, n_tokens = 48411, size = 62.813 MiB)
slot update_slots: id 0 | task 11644 | n_tokens = 48411, memory_seq_rm [48411, end)
slot update_slots: id 0 | task 11644 | prompt processing progress, n_tokens = 48890, batch.n_tokens = 479, progress = 0.989636
slot update_slots: id 0 | task 11644 | n_tokens = 48890, memory_seq_rm [48890, end)
slot init_sampler: id 0 | task 11644 | init sampler, took 6.26 ms, tokens: text = 49402, total = 49402
slot update_slots: id 0 | task 11644 | erasing old context checkpoint (pos_min = 45653, pos_max = 45653, n_tokens = 45654, size = 62.813 MiB)
slot update_slots: id 0 | task 11644 | created context checkpoint 8 of 8 (pos_min = 48889, pos_max = 48889, n_tokens = 48890, size = 62.813 MiB)
slot update_slots: id 0 | task 11644 | prompt processing done, n_tokens = 49402, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 11644 |
prompt eval time = 1413.94 ms / 991 tokens ( 1.43 ms per token, 700.88 tokens per second)
eval time = 15042.03 ms / 497 tokens ( 30.27 ms per token, 33.04 tokens per second)
total time = 16455.97 ms / 1488 tokens
slot release: id 0 | task 11644 | stop processing: n_tokens = 49898, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.990 (> 0.100 thold), f_keep = 0.991
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 12143 | processing task, is_child = 0
slot update_slots: id 0 | task 12143 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 49945
slot update_slots: id 0 | task 12143 | n_past = 49429, slot.prompt.tokens.size() = 49898, seq_id = 0, pos_min = 49897, n_swa = 1
slot update_slots: id 0 | task 12143 | restored context checkpoint (pos_min = 48889, pos_max = 48889, n_tokens = 48890, size = 62.813 MiB)
slot update_slots: id 0 | task 12143 | n_tokens = 48890, memory_seq_rm [48890, end)
slot update_slots: id 0 | task 12143 | prompt processing progress, n_tokens = 49433, batch.n_tokens = 543, progress = 0.989749
slot update_slots: id 0 | task 12143 | n_tokens = 49433, memory_seq_rm [49433, end)
slot init_sampler: id 0 | task 12143 | init sampler, took 6.19 ms, tokens: text = 49945, total = 49945
slot update_slots: id 0 | task 12143 | erasing old context checkpoint (pos_min = 46370, pos_max = 46370, n_tokens = 46371, size = 62.813 MiB)
slot update_slots: id 0 | task 12143 | created context checkpoint 8 of 8 (pos_min = 49432, pos_max = 49432, n_tokens = 49433, size = 62.813 MiB)
slot update_slots: id 0 | task 12143 | prompt processing done, n_tokens = 49945, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 12143 |
prompt eval time = 1421.93 ms / 1055 tokens ( 1.35 ms per token, 741.95 tokens per second)
eval time = 1497.88 ms / 48 tokens ( 31.21 ms per token, 32.05 tokens per second)
total time = 2919.81 ms / 1103 tokens
slot release: id 0 | task 12143 | stop processing: n_tokens = 49992, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.985 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 12193 | processing task, is_child = 0
slot update_slots: id 0 | task 12193 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 50752
slot update_slots: id 0 | task 12193 | n_tokens = 49992, memory_seq_rm [49992, end)
slot update_slots: id 0 | task 12193 | prompt processing progress, n_tokens = 50240, batch.n_tokens = 248, progress = 0.989912
slot update_slots: id 0 | task 12193 | n_tokens = 50240, memory_seq_rm [50240, end)
slot init_sampler: id 0 | task 12193 | init sampler, took 6.17 ms, tokens: text = 50752, total = 50752
slot update_slots: id 0 | task 12193 | erasing old context checkpoint (pos_min = 46814, pos_max = 46814, n_tokens = 46815, size = 62.813 MiB)
slot update_slots: id 0 | task 12193 | created context checkpoint 8 of 8 (pos_min = 50239, pos_max = 50239, n_tokens = 50240, size = 62.813 MiB)
slot update_slots: id 0 | task 12193 | prompt processing done, n_tokens = 50752, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 12193 |
prompt eval time = 1285.53 ms / 760 tokens ( 1.69 ms per token, 591.19 tokens per second)
eval time = 16873.23 ms / 539 tokens ( 31.30 ms per token, 31.94 tokens per second)
total time = 18158.76 ms / 1299 tokens
slot release: id 0 | task 12193 | stop processing: n_tokens = 51290, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.989 (> 0.100 thold), f_keep = 0.990
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 12734 | processing task, is_child = 0
slot update_slots: id 0 | task 12734 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 51324
slot update_slots: id 0 | task 12734 | n_past = 50782, slot.prompt.tokens.size() = 51290, seq_id = 0, pos_min = 51289, n_swa = 1
slot update_slots: id 0 | task 12734 | restored context checkpoint (pos_min = 50239, pos_max = 50239, n_tokens = 50240, size = 62.813 MiB)
slot update_slots: id 0 | task 12734 | n_tokens = 50240, memory_seq_rm [50240, end)
slot update_slots: id 0 | task 12734 | prompt processing progress, n_tokens = 50812, batch.n_tokens = 572, progress = 0.990024
slot update_slots: id 0 | task 12734 | n_tokens = 50812, memory_seq_rm [50812, end)
slot init_sampler: id 0 | task 12734 | init sampler, took 6.21 ms, tokens: text = 51324, total = 51324
slot update_slots: id 0 | task 12734 | erasing old context checkpoint (pos_min = 47373, pos_max = 47373, n_tokens = 47374, size = 62.813 MiB)
slot update_slots: id 0 | task 12734 | created context checkpoint 8 of 8 (pos_min = 50811, pos_max = 50811, n_tokens = 50812, size = 62.813 MiB)
slot update_slots: id 0 | task 12734 | prompt processing done, n_tokens = 51324, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 12734 |
prompt eval time = 1465.16 ms / 1084 tokens ( 1.35 ms per token, 739.85 tokens per second)
eval time = 1433.25 ms / 48 tokens ( 29.86 ms per token, 33.49 tokens per second)
total time = 2898.41 ms / 1132 tokens
slot release: id 0 | task 12734 | stop processing: n_tokens = 51371, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 12784 | processing task, is_child = 0
slot update_slots: id 0 | task 12784 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 51550
slot update_slots: id 0 | task 12784 | n_tokens = 51371, memory_seq_rm [51371, end)
slot init_sampler: id 0 | task 12784 | init sampler, took 6.34 ms, tokens: text = 51550, total = 51550
slot update_slots: id 0 | task 12784 | erasing old context checkpoint (pos_min = 47483, pos_max = 47483, n_tokens = 47484, size = 62.813 MiB)
slot update_slots: id 0 | task 12784 | created context checkpoint 8 of 8 (pos_min = 51370, pos_max = 51370, n_tokens = 51371, size = 62.813 MiB)
slot update_slots: id 0 | task 12784 | prompt processing done, n_tokens = 51550, batch.n_tokens = 179
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 12784 |
prompt eval time = 544.93 ms / 179 tokens ( 3.04 ms per token, 328.48 tokens per second)
eval time = 4531.82 ms / 147 tokens ( 30.83 ms per token, 32.44 tokens per second)
total time = 5076.75 ms / 326 tokens
slot release: id 0 | task 12784 | stop processing: n_tokens = 51696, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 12932 | processing task, is_child = 0
slot update_slots: id 0 | task 12932 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 51903
slot update_slots: id 0 | task 12932 | n_past = 51577, slot.prompt.tokens.size() = 51696, seq_id = 0, pos_min = 51695, n_swa = 1
slot update_slots: id 0 | task 12932 | restored context checkpoint (pos_min = 51370, pos_max = 51370, n_tokens = 51371, size = 62.813 MiB)
slot update_slots: id 0 | task 12932 | n_tokens = 51371, memory_seq_rm [51371, end)
slot update_slots: id 0 | task 12932 | prompt processing progress, n_tokens = 51391, batch.n_tokens = 20, progress = 0.990135
slot update_slots: id 0 | task 12932 | n_tokens = 51391, memory_seq_rm [51391, end)
slot init_sampler: id 0 | task 12932 | init sampler, took 6.74 ms, tokens: text = 51903, total = 51903
slot update_slots: id 0 | task 12932 | prompt processing done, n_tokens = 51903, batch.n_tokens = 512
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 12932 |
prompt eval time = 859.90 ms / 532 tokens ( 1.62 ms per token, 618.68 tokens per second)
eval time = 1353.72 ms / 44 tokens ( 30.77 ms per token, 32.50 tokens per second)
total time = 2213.62 ms / 576 tokens
slot release: id 0 | task 12932 | stop processing: n_tokens = 51946, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 12978 | processing task, is_child = 0
slot update_slots: id 0 | task 12978 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52146
slot update_slots: id 0 | task 12978 | n_tokens = 51946, memory_seq_rm [51946, end)
slot init_sampler: id 0 | task 12978 | init sampler, took 6.40 ms, tokens: text = 52146, total = 52146
slot update_slots: id 0 | task 12978 | erasing old context checkpoint (pos_min = 47586, pos_max = 47586, n_tokens = 47587, size = 62.813 MiB)
slot update_slots: id 0 | task 12978 | created context checkpoint 8 of 8 (pos_min = 51945, pos_max = 51945, n_tokens = 51946, size = 62.813 MiB)
slot update_slots: id 0 | task 12978 | prompt processing done, n_tokens = 52146, batch.n_tokens = 200
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 12978 |
prompt eval time = 561.78 ms / 200 tokens ( 2.81 ms per token, 356.01 tokens per second)
eval time = 1659.63 ms / 54 tokens ( 30.73 ms per token, 32.54 tokens per second)
total time = 2221.41 ms / 254 tokens
slot release: id 0 | task 12978 | stop processing: n_tokens = 52199, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 13033 | processing task, is_child = 0
slot update_slots: id 0 | task 13033 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52222
slot update_slots: id 0 | task 13033 | n_tokens = 52199, memory_seq_rm [52199, end)
slot init_sampler: id 0 | task 13033 | init sampler, took 6.63 ms, tokens: text = 52222, total = 52222
slot update_slots: id 0 | task 13033 | erasing old context checkpoint (pos_min = 48163, pos_max = 48163, n_tokens = 48164, size = 62.813 MiB)
slot update_slots: id 0 | task 13033 | created context checkpoint 8 of 8 (pos_min = 52198, pos_max = 52198, n_tokens = 52199, size = 62.813 MiB)
slot update_slots: id 0 | task 13033 | prompt processing done, n_tokens = 52222, batch.n_tokens = 23
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 13033 |
prompt eval time = 229.13 ms / 23 tokens ( 9.96 ms per token, 100.38 tokens per second)
eval time = 3935.84 ms / 125 tokens ( 31.49 ms per token, 31.76 tokens per second)
total time = 4164.97 ms / 148 tokens
slot release: id 0 | task 13033 | stop processing: n_tokens = 52346, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 13159 | processing task, is_child = 0
slot update_slots: id 0 | task 13159 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52404
slot update_slots: id 0 | task 13159 | n_tokens = 52346, memory_seq_rm [52346, end)
slot init_sampler: id 0 | task 13159 | init sampler, took 6.36 ms, tokens: text = 52404, total = 52404
slot update_slots: id 0 | task 13159 | erasing old context checkpoint (pos_min = 48410, pos_max = 48410, n_tokens = 48411, size = 62.813 MiB)
slot update_slots: id 0 | task 13159 | created context checkpoint 8 of 8 (pos_min = 52345, pos_max = 52345, n_tokens = 52346, size = 62.813 MiB)
slot update_slots: id 0 | task 13159 | prompt processing done, n_tokens = 52404, batch.n_tokens = 58
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 13159 |
prompt eval time = 362.37 ms / 58 tokens ( 6.25 ms per token, 160.06 tokens per second)
eval time = 1893.70 ms / 59 tokens ( 32.10 ms per token, 31.16 tokens per second)
total time = 2256.07 ms / 117 tokens
slot release: id 0 | task 13159 | stop processing: n_tokens = 52462, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 13219 | processing task, is_child = 0
slot update_slots: id 0 | task 13219 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52697
slot update_slots: id 0 | task 13219 | n_tokens = 52462, memory_seq_rm [52462, end)
slot init_sampler: id 0 | task 13219 | init sampler, took 6.27 ms, tokens: text = 52697, total = 52697
slot update_slots: id 0 | task 13219 | erasing old context checkpoint (pos_min = 48889, pos_max = 48889, n_tokens = 48890, size = 62.813 MiB)
slot update_slots: id 0 | task 13219 | created context checkpoint 8 of 8 (pos_min = 52461, pos_max = 52461, n_tokens = 52462, size = 62.813 MiB)
slot update_slots: id 0 | task 13219 | prompt processing done, n_tokens = 52697, batch.n_tokens = 235
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 13219 |
prompt eval time = 596.41 ms / 235 tokens ( 2.54 ms per token, 394.02 tokens per second)
eval time = 1704.99 ms / 54 tokens ( 31.57 ms per token, 31.67 tokens per second)
total time = 2301.40 ms / 289 tokens
slot release: id 0 | task 13219 | stop processing: n_tokens = 52750, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 13274 | processing task, is_child = 0
slot update_slots: id 0 | task 13274 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52771
slot update_slots: id 0 | task 13274 | n_tokens = 52750, memory_seq_rm [52750, end)
slot init_sampler: id 0 | task 13274 | init sampler, took 6.41 ms, tokens: text = 52771, total = 52771
slot update_slots: id 0 | task 13274 | erasing old context checkpoint (pos_min = 49432, pos_max = 49432, n_tokens = 49433, size = 62.813 MiB)
slot update_slots: id 0 | task 13274 | created context checkpoint 8 of 8 (pos_min = 52749, pos_max = 52749, n_tokens = 52750, size = 62.813 MiB)
slot update_slots: id 0 | task 13274 | prompt processing done, n_tokens = 52771, batch.n_tokens = 21
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 13274 |
prompt eval time = 226.14 ms / 21 tokens ( 10.77 ms per token, 92.86 tokens per second)
eval time = 454.11 ms / 15 tokens ( 30.27 ms per token, 33.03 tokens per second)
total time = 680.25 ms / 36 tokens
slot release: id 0 | task 13274 | stop processing: n_tokens = 52785, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 13290 | processing task, is_child = 0
slot update_slots: id 0 | task 13290 | new prompt, n_ctx_slot = 128000, n_keep = 0, task.n_tokens = 52806
slot update_slots: id 0 | task 13290 | n_tokens = 52785, memory_seq_rm [52785, end)
slot init_sampler: id 0 | task 13290 | init sampler, took 6.49 ms, tokens: text = 52806, total = 52806
slot update_slots: id 0 | task 13290 | prompt processing done, n_tokens = 52806, batch.n_tokens = 21
srv log_server_r: done request: POST /v1/messages 127.0.0.1 200
slot print_timing: id 0 | task 13290 |
prompt eval time = 204.55 ms / 21 tokens ( 9.74 ms per token, 102.66 tokens per second)
eval time = 8000.84 ms / 249 tokens ( 32.13 ms per token, 31.12 tokens per second)
total time = 8205.38 ms / 270 tokens
slot release: id 0 | task 13290 | stop processing: n_tokens = 53054, truncated = 0
srv update_slots: all slots are idle
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
srv log_server_r: done request: POST /v1/messages/count_tokens 127.0.0.1 200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment