Probing Conciousness with Multimodal Long-Context Large Language Models

In my Cognitive Psychology course at CUHK(SZ) — for which I received a grade of C+ — I was first introduced to the concept of perception. It’s a fascinating property found in myriad entities. For instance, after pretraining, a Vision Transformer can “perceive” a cat within its embeddings. The property also exists in many creatures (insects, cats, dogs, human).

I frequently muse about the essence of human consciousness—potentially defined as the brain’s perception of its existence. It’s a property absent in many entities. Take a bottle, for instance. Can it perceive its existence? This form of consciousness varies among beings—comparable in mammals, yet distinct in insects. I’m curious: if we were to replicate such a brain structure using VISOR and extract circuits using Deep Learning, and then program these networks to process signals, what could be the outcome?

On a parallel note, imagine using a large language model paired with vision or environmental encoders and external devices. If this model had an extensive context capacity (like 1,000,000+ tokens) to retain daily information, and if trained to embed these memories into its parameters, what could be the possibilities? And, the objective function, should be somewhat like uncertainty, because neurons prefer a predictable stimulus, rather than a random stimulus.

Enjoy Reading This Article?