Warmup

If you have a good pretrained model and would like to re-initialize some part of its parameters, you can consider using a larger batch size with a longer warmup steps. (like you are replacing text tokenizer for a pretrained language model).