Skip to content

Instantly share code, notes, and snippets.

@stevenkolawole
Created December 30, 2023 08:04
Show Gist options
  • Select an option

  • Save stevenkolawole/a991e272796de7f88aa7d054540f466f to your computer and use it in GitHub Desktop.

Select an option

Save stevenkolawole/a991e272796de7f88aa7d054540f466f to your computer and use it in GitHub Desktop.
MistralAttention's CUDA Error
python my_main.py --model mistralai/Mistral-7B-v0.1 --save out --masks_per_iter 100
torch 1.10.1
transformers 4.36.1
accelerate 0.25.0
# of gpus: 1
Namespace(model='mistralai/Mistral-7B-v0.1', seed=0, nsamples=14, sparsity_ratio=0.5, prune_frac=0.1, bsz=14, mlp_attn_ratio=1.0, prune_method='magnitude', cache_dir='llm_weights', use_variant=False, save='out', save_model=None, masks_per_iter=100, tol=0.02, sm_reg_weight='[1e2, 1e-4, 0]', sm_lr_factor='[100, 10, 1, 0.1]', sm_reg_type='l1', sm_lin_model_type='global', sm_bsz='[32, 64, 128]', sm_nepochs=50, wandb_project_name='Prune-No-Backward')
wandb: Currently logged in as: skolawol (shapelyprune). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.16.1
wandb: Run data is saved locally in /home/skolawol/workspace/my_wanda/wandb/run-20231230_041046-fegweqcc
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run nsamp=14_sp=0.5_pfrac=0.1_bsz=14_ma_ratio=1.0_mpi=100_Lin.regtype=l1_pmethod=magnitude_mlp_attn_ratio=1.0_Lin.regweight=100.0-0.0001-0_Lin.lr=100-10-1-0.1_Lin.bsz=32-64-128_Lin.nepochs=50_Lin.type=global_name=Prune-No-Backward
wandb: ⭐️ View project at https://wandb.ai/shapelyprune/Prune-No-Backward
wandb: 🚀 View run at https://wandb.ai/shapelyprune/Prune-No-Backward/runs/fegweqcc
loading llm model mistralai/Mistral-7B-v0.1
Loading checkpoint shards: 100%|███████████████████████████████████████████| 2/2 [00:14<00:00, 7.41s/it]
Some weights of MistralForCausalLM were not initialized from the model checkpoint at mistralai/Mistral-7B-v0.1 and are newly initialized: ['model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
evaluating on wikitext2
nsamples 163
sample 0
sample 50
sample 100
sample 150
nsamples 128
sample 0
sample 50
sample 100
Gathering statistics for pruning
nsamples 14
sample 0
nsamples 14
sample 0
[v1]Iter : 0 PPL = 7.170496463775635
nsamples 14
sample 0
[v2]Iter : 0 PPL = 6.812166690826416
nsamples 14
sample 0
[v1]Iter : 1 PPL = 6.358439922332764
nsamples 14
sample 0
[v2]Iter : 1 PPL = 7.749149799346924
nsamples 14
sample 0
[v1]Iter : 2 PPL = 6.306626796722412
nsamples 14
sample 0
[v2]Iter : 2 PPL = 8.290532112121582
nsamples 14
sample 0
[v1]Iter : 3 PPL = 6.415926933288574
nsamples 14
sample 0
[v2]Iter : 3 PPL = 14.613344192504883
nsamples 14
sample 0
[v1]Iter : 4 PPL = 6.352543830871582
nsamples 14
sample 0
[v2]Iter : 4 PPL = 8.251137733459473
nsamples 14
sample 0
[v1]Iter : 5 PPL = 6.292605400085449
nsamples 14
sample 0
[v2]Iter : 5 PPL = 7.797684192657471
nsamples 14
sample 0
[v1]Iter : 6 PPL = 6.309180736541748
nsamples 14
sample 0
[v2]Iter : 6 PPL = 7.813832759857178
nsamples 14
sample 0
[v1]Iter : 7 PPL = 7.047357559204102
nsamples 14
sample 0
[v2]Iter : 7 PPL = 6.724523067474365
nsamples 14
sample 0
[v1]Iter : 8 PPL = 7.289963245391846
nsamples 14
sample 0
[v2]Iter : 8 PPL = 7.090325832366943
nsamples 14
sample 0
[v1]Iter : 9 PPL = 7.569590091705322
nsamples 14
sample 0
[v2]Iter : 9 PPL = 6.6845245361328125
nsamples 14
sample 0
[v1]Iter : 10 PPL = 6.966268062591553
nsamples 14
sample 0
[v2]Iter : 10 PPL = 7.4620490074157715
nsamples 14
sample 0
[v1]Iter : 11 PPL = 7.363272190093994
nsamples 14
sample 0
[v2]Iter : 11 PPL = 6.755545139312744
nsamples 14
sample 0
[v1]Iter : 12 PPL = 6.282954692840576
nsamples 14
sample 0
[v2]Iter : 12 PPL = 7.682135581970215
nsamples 14
sample 0
[v1]Iter : 13 PPL = 7.435348033905029
nsamples 14
sample 0
[v2]Iter : 13 PPL = 7.827632904052734
nsamples 14
sample 0
[v1]Iter : 14 PPL = 7.147010803222656
nsamples 14
sample 0
[v2]Iter : 14 PPL = 6.789153099060059
nsamples 14
sample 0
[v1]Iter : 15 PPL = 7.13185977935791
nsamples 14
sample 0
[v2]Iter : 15 PPL = 6.772646903991699
nsamples 14
sample 0
[v1]Iter : 16 PPL = 6.397947311401367
nsamples 14
sample 0
[v2]Iter : 16 PPL = 7.833366394042969
nsamples 14
sample 0
[v1]Iter : 17 PPL = 6.276535987854004
nsamples 14
sample 0
[v2]Iter : 17 PPL = 7.884089946746826
nsamples 14
sample 0
[v1]Iter : 18 PPL = 7.197674751281738
nsamples 14
sample 0
[v2]Iter : 18 PPL = 6.597900390625
nsamples 14
sample 0
[v1]Iter : 19 PPL = 7.330356597900391
nsamples 14
sample 0
[v2]Iter : 19 PPL = 6.686861515045166
nsamples 14
sample 0
[v1]Iter : 20 PPL = 6.290648937225342
nsamples 14
sample 0
[v2]Iter : 20 PPL = 7.843414306640625
nsamples 14
sample 0
[v1]Iter : 21 PPL = 6.8761725425720215
nsamples 14
sample 0
[v2]Iter : 21 PPL = 7.712000370025635
nsamples 14
sample 0
[v1]Iter : 22 PPL = 6.402265548706055
nsamples 14
sample 0
[v2]Iter : 22 PPL = 7.8440470695495605
nsamples 14
sample 0
[v1]Iter : 23 PPL = 7.224035739898682
nsamples 14
sample 0
[v2]Iter : 23 PPL = 6.7227911949157715
nsamples 14
sample 0
[v1]Iter : 24 PPL = 6.4420695304870605
nsamples 14
sample 0
[v2]Iter : 24 PPL = 7.521695613861084
nsamples 14
sample 0
[v1]Iter : 25 PPL = 6.274460792541504
nsamples 14
sample 0
[v2]Iter : 25 PPL = 7.820735931396484
nsamples 14
sample 0
[v1]Iter : 26 PPL = 6.4466729164123535
nsamples 14
sample 0
[v2]Iter : 26 PPL = 7.354723930358887
nsamples 14
sample 0
[v1]Iter : 27 PPL = 7.381236553192139
nsamples 14
sample 0
[v2]Iter : 27 PPL = 6.672240257263184
nsamples 14
sample 0
[v1]Iter : 28 PPL = 6.975872039794922
nsamples 14
sample 0
[v2]Iter : 28 PPL = 7.308136463165283
nsamples 14
sample 0
[v1]Iter : 29 PPL = 6.268823146820068
nsamples 14
sample 0
[v2]Iter : 29 PPL = 7.9021100997924805
nsamples 14
sample 0
[v1]Iter : 30 PPL = 6.270018100738525
nsamples 14
sample 0
[v2]Iter : 30 PPL = 8.025629997253418
nsamples 14
sample 0
[v1]Iter : 31 PPL = 6.234014511108398
nsamples 14
sample 0
[v2]Iter : 31 PPL = 7.933075904846191
nsamples 14
sample 0
[v1]Iter : 32 PPL = 7.042508125305176
nsamples 14
sample 0
[v2]Iter : 32 PPL = 6.848394393920898
nsamples 14
sample 0
[v1]Iter : 33 PPL = 6.2845778465271
nsamples 14
sample 0
[v2]Iter : 33 PPL = 7.776439666748047
nsamples 14
sample 0
[v1]Iter : 34 PPL = 7.355691909790039
nsamples 14
sample 0
[v2]Iter : 34 PPL = 6.726131916046143
nsamples 14
sample 0
[v1]Iter : 35 PPL = 7.127555847167969
nsamples 14
sample 0
[v2]Iter : 35 PPL = 7.14017915725708
nsamples 14
sample 0
[v1]Iter : 36 PPL = 7.26483678817749
nsamples 14
sample 0
[v2]Iter : 36 PPL = 6.679401874542236
nsamples 14
sample 0
[v1]Iter : 37 PPL = 7.134992599487305
nsamples 14
sample 0
[v2]Iter : 37 PPL = 6.707190036773682
nsamples 14
sample 0
[v1]Iter : 38 PPL = 7.502450466156006
nsamples 14
sample 0
[v2]Iter : 38 PPL = 6.622519016265869
nsamples 14
sample 0
[v1]Iter : 39 PPL = 6.360657691955566
nsamples 14
sample 0
[v2]Iter : 39 PPL = 7.755874156951904
nsamples 14
sample 0
[v1]Iter : 40 PPL = 7.178237438201904
nsamples 14
sample 0
[v2]Iter : 40 PPL = 6.691136360168457
nsamples 14
sample 0
[v1]Iter : 41 PPL = 6.235278129577637
nsamples 14
sample 0
[v2]Iter : 41 PPL = 8.105535507202148
nsamples 14
sample 0
[v1]Iter : 42 PPL = 6.289116382598877
nsamples 14
sample 0
[v2]Iter : 42 PPL = 7.69081449508667
nsamples 14
sample 0
[v1]Iter : 43 PPL = 6.388136863708496
nsamples 14
sample 0
[v2]Iter : 43 PPL = 8.014190673828125
nsamples 14
sample 0
[v1]Iter : 44 PPL = 7.44135856628418
nsamples 14
sample 0
[v2]Iter : 44 PPL = 6.70253849029541
nsamples 14
sample 0
[v1]Iter : 45 PPL = 7.3979387283325195
nsamples 14
sample 0
[v2]Iter : 45 PPL = 6.733610153198242
nsamples 14
sample 0
[v1]Iter : 46 PPL = 6.276342391967773
nsamples 14
sample 0
[v2]Iter : 46 PPL = 7.7677507400512695
nsamples 14
sample 0
[v1]Iter : 47 PPL = 7.16223669052124
nsamples 14
sample 0
[v2]Iter : 47 PPL = 6.719339370727539
nsamples 14
sample 0
[v1]Iter : 48 PPL = 6.5343122482299805
nsamples 14
sample 0
[v2]Iter : 48 PPL = 7.516190052032471
nsamples 14
sample 0
[v1]Iter : 49 PPL = 6.595152378082275
nsamples 14
sample 0
[v2]Iter : 49 PPL = 7.790565013885498
/home/skolawol/miniconda3/envs/pruneenv_old/lib/python3.9/site-packages/plotly/matplotlylib/renderer.py:609: UserWarning:
I found a path object that I don't think is part of a bar chart. Ignoring.
Prune model
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [92,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
File "/home/skolawol/workspace/my_wanda/my_main.py", line 563, in <module>
main()
File "/home/skolawol/workspace/my_wanda/my_main.py", line 541, in main
prune_model(args, model, mask_info, tokenizer) # Do some stuffs here :)
File "/home/skolawol/workspace/my_wanda/my_main.py", line 454, in prune_model
prune_attn(mask_, module)
File "/home/skolawol/workspace/my_wanda/my_main.py", line 416, in prune_attn
new_k_proj = (prune_linear_layer(module.k_proj, updated_indices)).half()
File "/home/skolawol/miniconda3/envs/pruneenv_old/lib/python3.9/site-packages/transformers/pytorch_utils.py", line 69, in prune_linear_layer
W = layer.weight.index_select(dim, index).clone().detach()
RuntimeError: CUDA error: device-side assert triggered
wandb:
wandb: Run history:
wandb: SysStats/pruneruntime ▁
wandb: SysStats/scoreruntime ▁
wandb:
wandb: Run summary:
wandb: SysStats/pruneruntime 0.05156
wandb: SysStats/scoreruntime 4195.80715
wandb:
wandb: 🚀 View run nsamp=14_sp=0.5_pfrac=0.1_bsz=14_ma_ratio=1.0_mpi=100_Lin.regtype=l1_pmethod=magnitude_mlp_attn_ratio=1.0_Lin.regweight=100.0-0.0001-0_Lin.lr=100-10-1-0.1_Lin.bsz=32-64-128_Lin.nepochs=50_Lin.type=global_name=Prune-No-Backward at: https://wandb.ai/shapelyprune/Prune-No-Backward/runs/fegweqcc
wandb: ️⚡ View job at https://wandb.ai/shapelyprune/Prune-No-Backward/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEyNjMwMTMyMA==/version_details/v2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment