Grokking is really intriguing, not only in training models. I found that I have kept eating vegetables as dinner for over two weeks. I found it boring and it makes me hungry and unhappy during the night. But I am a person who dislikes to choose different foods from a variety of choices each day, so to eat the same thing for a long time seems to be less stressful and make me happier. I found that I seems to lose some weight suddenly. That’s weired.

So, I think I am grokking. A few days ago, I used a hard dataset to train MiniCPM-V-2.0, and I observed that the loss keeps decreasing, but the slope is small, and the accuracy is always 0. That’s so weired. But after about an hour (about 50 steps of training), the grokking happened. The loss sharply decreases and the accuracy shot up to about 70%.

I think there are some similarity between eating vegetables and grokking in model training. It can’t be observed temperally, but finally it works. The mechanism is hard to explain, and I haven’t got an answer.