You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
training sript as follows: torchrun --nproc_per_node=4 \ ../train.py --train_args_file ../train_args/dpo/full/deepseek-dpo-full.json
when exec to here blender.loadranker("llm-blender/PairRM",device=Accelerator().device) # load PairRM
will have error oad_fuser() [rank0]: Traceback (most recent call last): [rank0]: File "/lpai/code/firefly/shells/../train.py", line 923, in <module> [rank0]: main() [rank0]: File "/lpai/code/firefly/shells/../train.py", line 865, in main [rank0]: trainer = init_components(args, training_args) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/lpai/code/firefly/shells/../train.py", line 816, in init_components [rank0]: judge=PairRMJudge() [rank0]: ^^^^^^^^^^^^^ [rank0]: File "/opt/conda/lib/python3.11/site-packages/trl/trainer/judges.py", line 167, in __init__ [rank0]: self.blender.loadranker("llm-blender/PairRM", device=Accelerator().device) [rank0]: ^^^^^^^^^^^^^ [rank0]: File "/opt/conda/lib/python3.11/site-packages/accelerate/accelerator.py", line 292, in __init__ [rank0]: deepspeed_plugins = AcceleratorState().deepspeed_plugins [rank0]: ^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 887, in __init__ [rank0]: raise ValueError( [rank0]: ValueError: Please make sure to properly initialize your accelerator via accelerator = Accelerator()before using any functionality from theaccelerate library. [rank0]: size mismatch for pretrained_model.encoder.layer.23.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.layer.23.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.layer.23.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.layer.23.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.rel_embeddings.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
The text was updated successfully, but these errors were encountered:
training sript as follows:
torchrun --nproc_per_node=4 \ ../train.py --train_args_file ../train_args/dpo/full/deepseek-dpo-full.json
when exec to here
blender.loadranker("llm-blender/PairRM",device=Accelerator().device) # load PairRM
will have error
oad_fuser() [rank0]: Traceback (most recent call last): [rank0]: File "/lpai/code/firefly/shells/../train.py", line 923, in <module> [rank0]: main() [rank0]: File "/lpai/code/firefly/shells/../train.py", line 865, in main [rank0]: trainer = init_components(args, training_args) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/lpai/code/firefly/shells/../train.py", line 816, in init_components [rank0]: judge=PairRMJudge() [rank0]: ^^^^^^^^^^^^^ [rank0]: File "/opt/conda/lib/python3.11/site-packages/trl/trainer/judges.py", line 167, in __init__ [rank0]: self.blender.loadranker("llm-blender/PairRM", device=Accelerator().device) [rank0]: ^^^^^^^^^^^^^ [rank0]: File "/opt/conda/lib/python3.11/site-packages/accelerate/accelerator.py", line 292, in __init__ [rank0]: deepspeed_plugins = AcceleratorState().deepspeed_plugins [rank0]: ^^^^^^^^^^^^^^^^^^ [rank0]: File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 887, in __init__ [rank0]: raise ValueError( [rank0]: ValueError: Please make sure to properly initialize your accelerator via
accelerator = Accelerator()before using any functionality from the
acceleratelibrary.
[rank0]: size mismatch for pretrained_model.encoder.layer.23.output.dense.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.layer.23.output.dense.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.layer.23.output.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.layer.23.output.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.rel_embeddings.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.LayerNorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]). [rank0]: size mismatch for pretrained_model.encoder.LayerNorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
The text was updated successfully, but these errors were encountered: