Our environment propagates audiovisual (AV) signals that we need to attend to oftentimes. Visual and auditory contributions can serve jointly as a basis for selection, but it is unclear what hierarchical effects arise when initial selection criteria are unimodal, or involve uncertainty.
We investigated the effects of visuospatial selection on auditory processing with electroencephalography (EEG). Using temporal response function models (TRF) of the auditory EEG timeseries, we addressed the neural encoding of tone pips probabilistically associated to spatially-attended visual changes (‘flips’). AV precision (temporal uncertainty) was manipulated while participants sustained goal-driven, visuospatial selective attention. The roles of unimodal (visuospatial and auditory) uncertainties were further investigated.
TRF estimates showed AV precision determined cross-modal modulations, but also did visuospatial uncertainty by enabling the visual priming of tones when relevant for auditory segregation. Auditory uncertainty, in addition, determined susceptibility of early tone encoding to change by incoming visual update processing.
Sensory uncertainty is one factor considered in computational proposals of attention where precision weighting acts as primary mechanism for selection. The findings provide a hierarchical account of the role of uni- and cross-modal sources of uncertainty on the neural encoding of sound dynamics in a multimodal attention task.