Natural language evokes widespread BOLD responses in the human brain, and these responses are mostly selective for particular concepts. Here we use voxelwise encoding models combined with novel computational methods to probe several aspects of concept-specific responses. First, are concept representations grounded in sensory modalities, or are they purely amodal? Using visually-grounded word embedding spaces we find that not only are representations of concrete words (e.g. apple) grounded in visual properties, but so are representations of closely related abstract words (e.g. education). Second, is it sufficient to model how the brain responds to single words, or should we also consider more complete phrases? Using new machine learning techniques we find that phrase-based models are significantly and substantially better for predicting BOLD responses in nearly every area of the brain. We also use these new phrase-based models to try to understand what concepts are represented or processed in each brain area, with some surprising results.