This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] Traceback (most recent call last): | |
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] File "/workspace/vllm/vllm/v1/engine/core.py", line 926, in run_engine_core | |
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) | |
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] File "/workspace/vllm/vllm/v1/engine/core.py", line 691, in __init__ | |
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] super().__init__( | |
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] File "/workspace/vllm/vllm/v1/engine/core.py", line 105, in __init__ | |
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] self.model_executor = executor_class(vllm_config) | |
| (EngineCore_DP0 pid=467) ERROR 01-23 03:35:58 [core.py:935] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| tests/layers/vllm/test_unquantized.py::test_fused_moe[False-silu-False-2-8-128-1024-8-1-True] FAILED | |
| =================================== FAILURES =================================== | |
| ____________ test_fused_moe[False-silu-False-2-8-128-1024-8-1-True] ____________ | |
| use_ep = True, num_devices = 1, num_tokens = 8, intermediate_size = 1024 | |
| hidden_size = 128, num_experts = 8, topk = 2, has_bias = False | |
| activation = 'silu', enable_attn_dp = False | |
| @pytest.mark.parametrize("use_ep", [True, False]) | |
| @pytest.mark.parametrize("num_devices", [1, jax.local_device_count()]) | |
| @pytest.mark.parametrize("num_tokens", [8]) | |
| @pytest.mark.parametrize("intermediate_size", [1024, 2048]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| exec ${PAGER:-/usr/bin/less -R} "$0" || exit 1 | |
| Test settings: forge with network access | |
| Host details: itmm4.prod.google.com Linux 6.6.65-smp-1300.170.0.0 x86_64 astoria-genoa-base | |
| executor.INFO: analog/view?storage=borgremote&bns=/bns/it/borg/it/bns/build-forge-executor-tpu/prod-cbf-ghostlite.forge-executor/0&min_time=1764872604000000&ts=1764872614000000 | |
| Test command: | |
| cd /build/work/aef67bf50706fee86777a93cc065340a246c/google3/runfiles/google3 && \ | |
| env - \ | |
| BORG_CELL=it \ | |
| CUSTOM_METRICS_DIR=/build/work/aef67bf50706fee86777a93cc065340a246c/google3/../custom_metrics \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Let's trace the values for my_id = 1 with num_devices = 4: | |
| outer_step phase Accumulation Source left_copy_device right_copy_device Device providing the data | |
| 0 LEFT x_ref[left_copy_device, ...] (1+0+1)%4 = 2 (1-0-1)%4 = 0 Device 2 | |
| 0 RIGHT x_ref[right_copy_device, ...] (1+0+1)%4 = 2 (1-0-1)%4 = 0 Device 0 | |
| 1 LEFT x_ref[left_copy_device, ...] (1+1+1)%4 = 3 (1-1-1)%4 = 3 Device 3 | |
| 1 RIGHT x_ref[right_copy_device, ...] (1+1+1)%4 = 3 (1-1-1)%4 = 3 Device 3 | |
| 2 LEFT x_ref[left_copy_device, ...] (1+2+1)%4 = 0 (1-2-1)%4 = 2 Device 0 | |
| 2 RIGHT x_ref[right_copy_device, ...] (1+2+1)%4 = 0 (1-2-1)%4 = 2 Device 2 | |
| As you can see, with each outer_step, the *_copy_device variables change, ensuring that the reduction operation fetches data from a new, distinct device. This systematic progression guarantees that by the end of all steps, each device has accumulated its required portion of the total sum from all other devices. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import jax | |
| from jax import export | |
| import jax.numpy as jnp | |
| import pickle | |
| import time | |
| import statistics | |
| with open("/home/xiowei_google_com/new_exports.pkl", "rb") as f: | |
| data = pickle.load(f) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import jax | |
| from jax import export | |
| import jax.numpy as jnp | |
| import pickle | |
| import time | |
| import statistics | |
| with open("/home/xiowei_google_com/old_exports.pkl", "rb") as f: | |
| data = pickle.load(f) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 1. Start the benchmark server in vscode as [this](https://gist.github.com/vanbasten23/dd4f3cbb314a7b9cf6c003103c23c019). Select the correct python intepreter. | |
| 2. Then start the vllm server in debugger. | |
| 3. After the server is up and running. | |
| 4. Add the breakpoint (remember to turn of dynamo and jax jit) | |
| 5. Use the [script](https://gist.github.com/vanbasten23/726b28f072993fb7587482672b9c96a9) to send benchmarking request. Make sure to use the correct conda/python. | |
| 6. Then dump the input and output. | |
| ========================= | |
| pip install flatbuffers |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # Usage: | |
| # bash run_tpu_benchmark_client.sh --model Qwen/Qwen2.5-1.5B-Instruct --tp 1 | |
| LONGOPTS=model:,tp:,profile | |
| # Parse arguments | |
| PARSED=$(getopt --options=$OPTIONS --longoptions=$LONGOPTS --name "$0" -- "$@") | |
| if [[ $? -ne 0 ]]; then | |
| exit 2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "name": "newjax_benchmark_server", | |
| "type": "debugpy", | |
| "request": "launch", | |
| "program": "/home/xiowei_google_com/miniconda3/envs/vllm_newjax/bin/vllm", | |
| "console": "integratedTerminal", | |
| "justMyCode": false, | |
| "env": { | |
| "MODEL_IMPL_TYPE": "vllm", | |
| "TPU_BACKEND_TYPE": "jax", |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import jax | |
| from jax import export | |
| import jax.numpy as jnp | |
| import pickle | |
| import time | |
| import statistics | |
| with open("/home/xiowei_google_com/old_exports.pkl", "rb") as f: | |
| data = pickle.load(f) |
NewerOlder