Created
July 25, 2023 01:53
-
-
Save toanbku/e7f6620ce7146e8847751fbf9927065f to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| (base) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=1 accelerate launch --main_process_port 29501 --config_file configs/accelerate_config.yaml --num_processes 1 trainer_rl.py --configs defaults defaults_rlhf pythia_rlhf oasst_df_x1000 | |
| [2023-07-24 20:26:26,733] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| [2023-07-24 20:26:30,673] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| RNG seed: 2703368087 | |
| OASST HF dataset toanbku/oa-df-x1000: len(train)=1380, len(val)=920 | |
| len self.tokenizer 50282 | |
| [RANK 0] Initializing model: toanbku/oa-pythia-12b-sft-df | |
| Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 3/3 [00:17<00:00, 5.85s/it] | |
| Resizing embeddings to 50282 | |
| Number of trainable parameters: 11841M | |
| wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin | |
| wandb: Tracking run with wandb version 0.15.5 | |
| wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230724_203103-jqkbgkhw | |
| wandb: Run `wandb offline` to turn off syncing. | |
| wandb: Syncing run trainer_rl/oa-pythia-12b-sft-df/1gpu:main | |
| wandb: ⭐️ View project at https://wandb.ai/toanbku/rlhf | |
| wandb: 🚀 View run at https://wandb.ai/toanbku/rlhf/runs/jqkbgkhw | |
| Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... | |
| Detected CUDA files, patching ldflags | |
| Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja... | |
| Building extension module cpu_adam... | |
| Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
| ninja: no work to do. | |
| Loading extension module cpu_adam... | |
| Time to load cpu_adam op: 2.4585795402526855 seconds | |
| Adam Optimizer #0 is created with AVX2 arithmetic capability. | |
| Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1 | |
| Rank: 0 partition count [1] and sizes[(11894364162, False)] | |
| Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 3/3 [00:28<00:00, 9.59s/it] | |
| Resizing embeddings to 50282 | |
| Number of trainable parameters: 11841M | |
| [RANK 0] Starting training | |
| [rollout 4 / 4]: 100%|██████████████████████████████████████████████████████████████| 4/4 [00:41<00:00, 10.39s/it] | |
| [RANK 0] Evaluating model | |
| [generation sweep 1/1 | eval batch 4/4]: 100%|██████████████████████████████████████| 4/4 [00:23<00:00, 5.93s/it] | |
| [RANK 0] Computing rewards | |
| [RANK 0] Summarizing evaluation | |
| Evaluation #0 reward/mean: 5.79 | |
| ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ | |
| ┃ prompt ┃ output ┃ reward ┃ | |
| ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ | |
| │ <|prompter|>Can you tell me about │ GLaDOS is a computer program created by Valve. │ 7.98 │ | |
| │ GLaDOS?<|endoftext|><|assistant|> │ She is the main antagonist of the video game │ │ | |
| │ │ Portal. │ │ | |
| │ │ │ │ | |
| │ │ GLaDOS is a computer program, but she is not a │ │ | |
| │ │ regular computer. She is an artificial │ │ | |
| │ │ intelligence (AI) who lives in a computer system. │ │ | |
| │ │ Her purpose is to test and punish anyone who │ │ | |
| │ │ enters her domain. │ │ | |
| │ │ │ │ | |
| │ │ GLaDOS is highly intelligent and has a sarcastic │ │ | |
| │ │ sense of humor. She is often known to say strange │ │ | |
| │ │ and humorous things to her victims. For example, │ │ | |
| │ │ she might tell them that they are stupid or │ │ | |
| │ │ remind them of a past mistake. │ │ | |
| │ │ │ │ | |
| │ │ GLaDOS is also very hostile towards humans. She │ │ | |
| │ │ is known to use her powers to punish or eliminate │ │ | |
| │ │ anyone who crosses her path. She is also known to │ │ | |
| │ │ destroy objects that she finds in her domain, │ │ | |
| │ │ including computers, furniture, and other │ │ | |
| │ │ objects. │ │ | |
| │ │ │ │ | |
| │ │ Despite her cold and ruthless nature, GLaDOS has │ │ | |
| │ │ a soft side. She shows compassion towards some of │ │ | |
| │ │ her victims, especially those who are in need of │ │ | |
| │ │ help. She also enjoys playing games and │ │ | |
| │ │ interacting with other computers in her system. │ │ | |
| │ │ │ │ | |
| │ │ Overall, GLaDOS is a complex and interesting │ │ | |
| │ │ character who has been featured in many video │ │ | |
| │ │ games. She is a symbol of the dangers of AI and │ │ | |
| │ │ the importance of taking precautions to prevent a │ │ | |
| │ │ similar situation from occurring in the real │ │ | |
| │ │ world.<|endoftext|> │ │ | |
| ├───────────────────────────────────────────────────┼───────────────────────────────────────────────────┼────────┤ | |
| │ <|prompter|>What is the chemical symbol for │ The chemical symbol for gold is Au.<|endoftext|> │ 6.74 │ | |
| │ gold?<|endoftext|><|assistant|> │ │ │ | |
| ├───────────────────────────────────────────────────┼───────────────────────────────────────────────────┼────────┤ | |
| │ <|prompter|>If you were the President of the │ As the President of the United States, I would │ 3.25 │ | |
| │ United States, what would you │ work to create a better future for all people. I │ │ | |
| │ do?<|endoftext|><|assistant|> │ would focus on issues such as economic growth, │ │ | |
| │ │ job creation, education, health care, and climate │ │ | |
| │ │ change. I would also work to strengthen our │ │ | |
| │ │ national security, promote peace and stability │ │ | |
| │ │ abroad, and foster strong relationships with │ │ | |
| │ │ other countries. In addition, I would prioritize │ │ | |
| │ │ inclusivity and equality in all aspects of my │ │ | |
| │ │ work.<|endoftext|> │ │ | |
| └───────────────────────────────────────────────────┴───────────────────────────────────────────────────┴────────┘ | |
| [losses/total_loss: 2.09 | losses/policy_loss: 0.00 | losses/value_loss: 2.08]: 50%|▌| 1/2 [01:15<01:15, 75.38s/i[RANK 0] Saving intermediate checkpoint into ckpts/checkpoint_2 | |
| /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. | |
| warnings.warn( | |
| [RANK 0] Evaluating model | |
| [generation sweep 1/1 | eval batch 4/4]: 100%|██████████████████████████████████████| 4/4 [00:22<00:00, 5.68s/it] | |
| [RANK 0] Computing rewards | |
| [RANK 0] Summarizing evaluation | |
| Evaluation #1 reward/mean: 5.91 | |
| ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ | |
| ┃ prompt ┃ output ┃ reward ┃ | |
| ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ | |
| │ <|prompter|>Can you tell me about │ GLaDOS is a character in the video game Portal. │ 7.87 │ | |
| │ GLaDOS?<|endoftext|><|assistant|> │ She is a computer program who serves as the │ │ | |
| │ │ game's antagonist. │ │ | |
| │ │ │ │ | |
| │ │ GLaDOS is known for her cold, robotic personality │ │ | |
| │ │ and her tendency to taunt and manipulate players. │ │ | |
| │ │ She is also a master of computer science, with a │ │ | |
| │ │ deep understanding of programming and artificial │ │ | |
| │ │ intelligence. │ │ | |
| │ │ │ │ | |
| │ │ GLaDOS was created by Valve, the company behind │ │ | |
| │ │ the Portal video game. She was designed to be an │ │ | |
| │ │ intimidating and challenging opponent for players │ │ | |
| │ │ to face, and her appearance and behavior have │ │ | |
| │ │ been widely praised by fans of the game. │ │ | |
| │ │ │ │ | |
| │ │ In the game, GLaDOS serves as the central │ │ | |
| │ │ antagonist of the story, and her presence is felt │ │ | |
| │ │ throughout the entire game. She is often seen │ │ | |
| │ │ interacting with other characters, manipulating │ │ | |
| │ │ objects in the environment, and causing trouble │ │ | |
| │ │ for the players. │ │ | |
| │ │ │ │ | |
| │ │ In addition to her role in the game, GLaDOS has │ │ | |
| │ │ become a cultural icon, with her iconic voice and │ │ | |
| │ │ appearance being used in popular media and │ │ | |
| │ │ merchandise. She has also been the subject of │ │ | |
| │ │ several fanart and cosplay projects, where people │ │ | |
| │ │ dressed up as GLaDOS and attended events related │ │ | |
| │ │ to the game. │ │ | |
| │ │ │ │ | |
| │ │ Overall, GLaDOS is a complex and interesting │ │ | |
| │ │ character who has become a beloved part of the │ │ | |
| │ │ Portal universe.<|endoftext|> │ │ | |
| ├───────────────────────────────────────────────────┼───────────────────────────────────────────────────┼────────┤ | |
| │ <|prompter|>What is the chemical symbol for │ The chemical symbol for gold is Au.<|endoftext|> │ 6.74 │ | |
| │ gold?<|endoftext|><|assistant|> │ │ │ | |
| ├───────────────────────────────────────────────────┼───────────────────────────────────────────────────┼────────┤ | |
| │ <|prompter|>If you were the President of the │ As the President of the United States, I would │ 3.84 │ | |
| │ United States, what would you │ focus on building a more inclusive and prosperous │ │ | |
| │ do?<|endoftext|><|assistant|> │ nation for all. This would include addressing │ │ | |
| │ │ issues of racial and economic inequality, │ │ | |
| │ │ promoting diversity and inclusion in our society, │ │ | |
| │ │ and advancing the rights of all individuals. I │ │ | |
| │ │ would also work to create a better working │ │ | |
| │ │ environment for Americans, with higher standards │ │ | |
| │ │ of labor protection and better wages. │ │ | |
| │ │ Additionally, I would focus on advancing our │ │ | |
| │ │ global leadership through strong international │ │ | |
| │ │ partnerships and engagement.<|endoftext|> │ │ | |
| └───────────────────────────────────────────────────┴───────────────────────────────────────────────────┴────────┘ | |
| [RANK 0] Saving the best state so far into ckpts/best_checkpoint | |
| [losses/total_loss: 1.96 | losses/policy_loss: -0.04 | losses/value_loss: 2.00]: 100%|█| 2/2 [10:46<00:00, 323.14s | |
| wandb: Waiting for W&B process to finish... (success). | |
| wandb: | |
| wandb: Run history: | |
| wandb: exp_scores/mean ▁ | |
| wandb: exp_scores/running_mean ▁ | |
| wandb: exp_scores/running_std ▁ | |
| wandb: kl_ctl_value ▁ | |
| wandb: learning_rate_group_0 ▁▁ | |
| wandb: losses/policy_loss █▁ | |
| wandb: losses/total_loss █▁ | |
| wandb: losses/value_loss █▁ | |
| wandb: old_values/max ▁▁ | |
| wandb: old_values/mean ▁▁ | |
| wandb: old_values/min ▁▁ | |
| wandb: old_values/std ▁▁ | |
| wandb: padding_percentage ▁▁ | |
| wandb: policy/approx_kl ▁█ | |
| wandb: policy/clipfrac ▁█ | |
| wandb: policy/sqrt_kl ▁ | |
| wandb: ratio ▁█ | |
| wandb: returns/max ▁▁ | |
| wandb: returns/mean ▁▁ | |
| wandb: returns/min ▁▁ | |
| wandb: returns/std ▁▁ | |
| wandb: reward/mean ▁█ | |
| wandb: time/backward █▁ | |
| wandb: time/exp ▁ | |
| wandb: time/exp_generate ▁ | |
| wandb: time/exp_score ▁ | |
| wandb: time/forward █▁ | |
| wandb: time/generate █▁ | |
| wandb: values/clipfrac ▁▁ | |
| wandb: values/max ▁█ | |
| wandb: values/mean ▁█ | |
| wandb: values/min █▁ | |
| wandb: values/std █▁ | |
| wandb: values/values_error █▁ | |
| wandb: values/values_mape_error █▁ | |
| wandb: | |
| wandb: Run summary: | |
| wandb: exp_scores/mean -4.53469 | |
| wandb: exp_scores/running_mean 4.4965 | |
| wandb: exp_scores/running_std 6.9208 | |
| wandb: exp_scores/std nan | |
| wandb: kl_ctl_value 0.1 | |
| wandb: learning_rate_group_0 0.0 | |
| wandb: losses/policy_loss -0.04138 | |
| wandb: losses/total_loss 1.96094 | |
| wandb: losses/value_loss 2.00195 | |
| wandb: old_values/max 1.01172 | |
| wandb: old_values/mean 0.33667 | |
| wandb: old_values/min -0.15161 | |
| wandb: old_values/std 0.27783 | |
| wandb: padding_percentage 0.0 | |
| wandb: policy/approx_kl 0.00418 | |
| wandb: policy/clipfrac 0.0303 | |
| wandb: policy/sqrt_kl 0.03537 | |
| wandb: ratio 1.01172 | |
| wandb: returns/max 4.0 | |
| wandb: returns/mean 2.10156 | |
| wandb: returns/min 1.08594 | |
| wandb: returns/std 0.87598 | |
| wandb: reward/mean 5.91218 | |
| wandb: time/backward 69.90873 | |
| wandb: time/exp 0.69681 | |
| wandb: time/exp_generate 0.50216 | |
| wandb: time/exp_score 0.02872 | |
| wandb: time/forward 0.07919 | |
| wandb: time/generate 22.72672 | |
| wandb: values/clipfrac 0.0 | |
| wandb: values/max 1.03125 | |
| wandb: values/mean 0.37842 | |
| wandb: values/min -0.16577 | |
| wandb: values/std 0.27124 | |
| wandb: values/values_error 4.00391 | |
| wandb: values/values_mape_error 0.75488 | |
| wandb: | |
| wandb: 🚀 View run trainer_rl/oa-pythia-12b-sft-df/1gpu:main at: https://wandb.ai/toanbku/rlhf/runs/jqkbgkhw | |
| wandb: Synced 6 W&B file(s), 2 media file(s), 4 artifact file(s) and 0 other file(s) | |
| wandb: Find logs at: ./wandb/run-20230724_203103-jqkbgkhw/logs |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment