Reflections in model training (May, 2025)
Warmup
If you have a good pretrained model and would like to re-initialize some part of its parameters, you can consider using a larger batch size with a longer warmup steps. (like you are replacing text tokenizer for a pretrained language model).
Enjoy Reading This Article?
Here are some more articles you might like to read next: