Speech Generation with RL is a possible way to realize Speech Autoregression. Based on the results in https://arxiv.org/abs/2411.17607, the pure speech autoregression seems feasible but not as good as text autoregression. Through RL, it is possible to increase the score of Speech Conversation to the level of pure text.