Skip to content

Instantly share code, notes, and snippets.

@toanbku
Created July 25, 2023 01:53
Show Gist options
  • Select an option

  • Save toanbku/e7f6620ce7146e8847751fbf9927065f to your computer and use it in GitHub Desktop.

Select an option

Save toanbku/e7f6620ce7146e8847751fbf9927065f to your computer and use it in GitHub Desktop.
(base) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=1 accelerate launch --main_process_port 29501 --config_file configs/accelerate_config.yaml --num_processes 1 trainer_rl.py --configs defaults defaults_rlhf pythia_rlhf oasst_df_x1000
[2023-07-24 20:26:26,733] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-24 20:26:30,673] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
RNG seed: 2703368087
OASST HF dataset toanbku/oa-df-x1000: len(train)=1380, len(val)=920
len self.tokenizer 50282
[RANK 0] Initializing model: toanbku/oa-pythia-12b-sft-df
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 3/3 [00:17<00:00, 5.85s/it]
Resizing embeddings to 50282
Number of trainable parameters: 11841M
wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230724_203103-jqkbgkhw
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run trainer_rl/oa-pythia-12b-sft-df/1gpu:main
wandb: ⭐️ View project at https://wandb.ai/toanbku/rlhf
wandb: 🚀 View run at https://wandb.ai/toanbku/rlhf/runs/jqkbgkhw
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.4585795402526855 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1
Rank: 0 partition count [1] and sizes[(11894364162, False)]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 3/3 [00:28<00:00, 9.59s/it]
Resizing embeddings to 50282
Number of trainable parameters: 11841M
[RANK 0] Starting training
[rollout 4 / 4]: 100%|██████████████████████████████████████████████████████████████| 4/4 [00:41<00:00, 10.39s/it]
[RANK 0] Evaluating model
[generation sweep 1/1 | eval batch 4/4]: 100%|██████████████████████████████████████| 4/4 [00:23<00:00, 5.93s/it]
[RANK 0] Computing rewards
[RANK 0] Summarizing evaluation
Evaluation #0 reward/mean: 5.79
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ prompt ┃ output ┃ reward ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ <|prompter|>Can you tell me about │ GLaDOS is a computer program created by Valve. │ 7.98 │
│ GLaDOS?<|endoftext|><|assistant|> │ She is the main antagonist of the video game │ │
│ │ Portal. │ │
│ │ │ │
│ │ GLaDOS is a computer program, but she is not a │ │
│ │ regular computer. She is an artificial │ │
│ │ intelligence (AI) who lives in a computer system. │ │
│ │ Her purpose is to test and punish anyone who │ │
│ │ enters her domain. │ │
│ │ │ │
│ │ GLaDOS is highly intelligent and has a sarcastic │ │
│ │ sense of humor. She is often known to say strange │ │
│ │ and humorous things to her victims. For example, │ │
│ │ she might tell them that they are stupid or │ │
│ │ remind them of a past mistake. │ │
│ │ │ │
│ │ GLaDOS is also very hostile towards humans. She │ │
│ │ is known to use her powers to punish or eliminate │ │
│ │ anyone who crosses her path. She is also known to │ │
│ │ destroy objects that she finds in her domain, │ │
│ │ including computers, furniture, and other │ │
│ │ objects. │ │
│ │ │ │
│ │ Despite her cold and ruthless nature, GLaDOS has │ │
│ │ a soft side. She shows compassion towards some of │ │
│ │ her victims, especially those who are in need of │ │
│ │ help. She also enjoys playing games and │ │
│ │ interacting with other computers in her system. │ │
│ │ │ │
│ │ Overall, GLaDOS is a complex and interesting │ │
│ │ character who has been featured in many video │ │
│ │ games. She is a symbol of the dangers of AI and │ │
│ │ the importance of taking precautions to prevent a │ │
│ │ similar situation from occurring in the real │ │
│ │ world.<|endoftext|> │ │
├───────────────────────────────────────────────────┼───────────────────────────────────────────────────┼────────┤
│ <|prompter|>What is the chemical symbol for │ The chemical symbol for gold is Au.<|endoftext|> │ 6.74 │
│ gold?<|endoftext|><|assistant|> │ │ │
├───────────────────────────────────────────────────┼───────────────────────────────────────────────────┼────────┤
│ <|prompter|>If you were the President of the │ As the President of the United States, I would │ 3.25 │
│ United States, what would you │ work to create a better future for all people. I │ │
│ do?<|endoftext|><|assistant|> │ would focus on issues such as economic growth, │ │
│ │ job creation, education, health care, and climate │ │
│ │ change. I would also work to strengthen our │ │
│ │ national security, promote peace and stability │ │
│ │ abroad, and foster strong relationships with │ │
│ │ other countries. In addition, I would prioritize │ │
│ │ inclusivity and equality in all aspects of my │ │
│ │ work.<|endoftext|> │ │
└───────────────────────────────────────────────────┴───────────────────────────────────────────────────┴────────┘
[losses/total_loss: 2.09 | losses/policy_loss: 0.00 | losses/value_loss: 2.08]: 50%|▌| 1/2 [01:15<01:15, 75.38s/i[RANK 0] Saving intermediate checkpoint into ckpts/checkpoint_2
/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
[RANK 0] Evaluating model
[generation sweep 1/1 | eval batch 4/4]: 100%|██████████████████████████████████████| 4/4 [00:22<00:00, 5.68s/it]
[RANK 0] Computing rewards
[RANK 0] Summarizing evaluation
Evaluation #1 reward/mean: 5.91
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ prompt ┃ output ┃ reward ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ <|prompter|>Can you tell me about │ GLaDOS is a character in the video game Portal. │ 7.87 │
│ GLaDOS?<|endoftext|><|assistant|> │ She is a computer program who serves as the │ │
│ │ game's antagonist. │ │
│ │ │ │
│ │ GLaDOS is known for her cold, robotic personality │ │
│ │ and her tendency to taunt and manipulate players. │ │
│ │ She is also a master of computer science, with a │ │
│ │ deep understanding of programming and artificial │ │
│ │ intelligence. │ │
│ │ │ │
│ │ GLaDOS was created by Valve, the company behind │ │
│ │ the Portal video game. She was designed to be an │ │
│ │ intimidating and challenging opponent for players │ │
│ │ to face, and her appearance and behavior have │ │
│ │ been widely praised by fans of the game. │ │
│ │ │ │
│ │ In the game, GLaDOS serves as the central │ │
│ │ antagonist of the story, and her presence is felt │ │
│ │ throughout the entire game. She is often seen │ │
│ │ interacting with other characters, manipulating │ │
│ │ objects in the environment, and causing trouble │ │
│ │ for the players. │ │
│ │ │ │
│ │ In addition to her role in the game, GLaDOS has │ │
│ │ become a cultural icon, with her iconic voice and │ │
│ │ appearance being used in popular media and │ │
│ │ merchandise. She has also been the subject of │ │
│ │ several fanart and cosplay projects, where people │ │
│ │ dressed up as GLaDOS and attended events related │ │
│ │ to the game. │ │
│ │ │ │
│ │ Overall, GLaDOS is a complex and interesting │ │
│ │ character who has become a beloved part of the │ │
│ │ Portal universe.<|endoftext|> │ │
├───────────────────────────────────────────────────┼───────────────────────────────────────────────────┼────────┤
│ <|prompter|>What is the chemical symbol for │ The chemical symbol for gold is Au.<|endoftext|> │ 6.74 │
│ gold?<|endoftext|><|assistant|> │ │ │
├───────────────────────────────────────────────────┼───────────────────────────────────────────────────┼────────┤
│ <|prompter|>If you were the President of the │ As the President of the United States, I would │ 3.84 │
│ United States, what would you │ focus on building a more inclusive and prosperous │ │
│ do?<|endoftext|><|assistant|> │ nation for all. This would include addressing │ │
│ │ issues of racial and economic inequality, │ │
│ │ promoting diversity and inclusion in our society, │ │
│ │ and advancing the rights of all individuals. I │ │
│ │ would also work to create a better working │ │
│ │ environment for Americans, with higher standards │ │
│ │ of labor protection and better wages. │ │
│ │ Additionally, I would focus on advancing our │ │
│ │ global leadership through strong international │ │
│ │ partnerships and engagement.<|endoftext|> │ │
└───────────────────────────────────────────────────┴───────────────────────────────────────────────────┴────────┘
[RANK 0] Saving the best state so far into ckpts/best_checkpoint
[losses/total_loss: 1.96 | losses/policy_loss: -0.04 | losses/value_loss: 2.00]: 100%|█| 2/2 [10:46<00:00, 323.14s
wandb: Waiting for W&B process to finish... (success).
wandb:
wandb: Run history:
wandb: exp_scores/mean ▁
wandb: exp_scores/running_mean ▁
wandb: exp_scores/running_std ▁
wandb: kl_ctl_value ▁
wandb: learning_rate_group_0 ▁▁
wandb: losses/policy_loss █▁
wandb: losses/total_loss █▁
wandb: losses/value_loss █▁
wandb: old_values/max ▁▁
wandb: old_values/mean ▁▁
wandb: old_values/min ▁▁
wandb: old_values/std ▁▁
wandb: padding_percentage ▁▁
wandb: policy/approx_kl ▁█
wandb: policy/clipfrac ▁█
wandb: policy/sqrt_kl ▁
wandb: ratio ▁█
wandb: returns/max ▁▁
wandb: returns/mean ▁▁
wandb: returns/min ▁▁
wandb: returns/std ▁▁
wandb: reward/mean ▁█
wandb: time/backward █▁
wandb: time/exp ▁
wandb: time/exp_generate ▁
wandb: time/exp_score ▁
wandb: time/forward █▁
wandb: time/generate █▁
wandb: values/clipfrac ▁▁
wandb: values/max ▁█
wandb: values/mean ▁█
wandb: values/min █▁
wandb: values/std █▁
wandb: values/values_error █▁
wandb: values/values_mape_error █▁
wandb:
wandb: Run summary:
wandb: exp_scores/mean -4.53469
wandb: exp_scores/running_mean 4.4965
wandb: exp_scores/running_std 6.9208
wandb: exp_scores/std nan
wandb: kl_ctl_value 0.1
wandb: learning_rate_group_0 0.0
wandb: losses/policy_loss -0.04138
wandb: losses/total_loss 1.96094
wandb: losses/value_loss 2.00195
wandb: old_values/max 1.01172
wandb: old_values/mean 0.33667
wandb: old_values/min -0.15161
wandb: old_values/std 0.27783
wandb: padding_percentage 0.0
wandb: policy/approx_kl 0.00418
wandb: policy/clipfrac 0.0303
wandb: policy/sqrt_kl 0.03537
wandb: ratio 1.01172
wandb: returns/max 4.0
wandb: returns/mean 2.10156
wandb: returns/min 1.08594
wandb: returns/std 0.87598
wandb: reward/mean 5.91218
wandb: time/backward 69.90873
wandb: time/exp 0.69681
wandb: time/exp_generate 0.50216
wandb: time/exp_score 0.02872
wandb: time/forward 0.07919
wandb: time/generate 22.72672
wandb: values/clipfrac 0.0
wandb: values/max 1.03125
wandb: values/mean 0.37842
wandb: values/min -0.16577
wandb: values/std 0.27124
wandb: values/values_error 4.00391
wandb: values/values_mape_error 0.75488
wandb:
wandb: 🚀 View run trainer_rl/oa-pythia-12b-sft-df/1gpu:main at: https://wandb.ai/toanbku/rlhf/runs/jqkbgkhw
wandb: Synced 6 W&B file(s), 2 media file(s), 4 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230724_203103-jqkbgkhw/logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment