Sleep, learning and memory: optimal inference in the prefrontal cortex

Adrien Peyrache, New York University

Cortical neurons collectively code, compute and store information. The recent inference- by-sampling hypothesis proposes that cortical population activity at a given time is a sample from an underlying probability distribution, which can be reconstructed by integrating over samples. A key prediction is that the distribution during sleep, or “spontaneous” activity, (representing the prior) and during evoked activity (representing the posterior) converge over repeated experience. This convergence represents the updating of an internal model to match the relevant statistics of the external world. Just such a convergence has been observed in small populations from visual cortex over development. Unknown is the extent to which this hypothesis is a general computational principle for cortex: whether it can be observed during learning, or in higher-order cortices, or during ongoing behavior.

To address these issues, we analyzed population activity from the prefrontal cortex (PFC) of rats learning rules in a Y-maze task. The PFC is necessary for learning new rules or strategies, and change in PFC neuron firing times correlates with successful rule learning, suggesting the PFC plays a role in building an internal model of a task. We theorized that PFC population activity on each trial was sampling from the posterior distribution over (unknown) task parameters; and that “spontaneous” activity during sleep, occurring in the absence of task-related stimuli and behavior, samples the prior distribution. In this way, we sought to isolate changes in population activity solely due to rule-learning, assuming that prior experience allowed learning of the basic task parameters, including stimuli and reward locations.

We find moment-to-moment samples of population activity converge over learning of the task, consistent with an inference-by-sampling computation underpinning task performance. Previous work observed fine structure in stimulus-evoked population activity patterns; here in a behavioral task we have identified clues to what such patterns encode – in this case, the decision rule. Our analyses suggest that the PFC constructs a probabilistic internal model of task parameters, which is sampled online to guide behavior.