How does the brain learn to see? Newborn babies can hardly recognize anything by sight, yet by the time we are adults we can make exquisitely fine judgments about an object’s 3D shape, whether it is soft or hard, fragile or durable, and anticipate how it is likely to feel if we touch it. Somehow, by looking at and interacting with lots of ‘Stuff,’ we learn how to recognize it. Meanwhile, computer vision has made astounding progress in the past decade through innovations in building and training deep neural networks (DNNs). However, most state-of-the-art achievements rely on supervised learning, which requires huge labelled training datasets. As a model of biological vision, these networks make the fundamentally impossible assumption that the brain has access to the ground-truth state of the physical world, and simply learns a correspondence between sensory data and those true states. I am interested in an alternative and more ecologically plausible kind of deep learning – unsupervised learning – in which DNNs learn the statistical structure of images, without any additional scene information. I will talk about two projects in which I am using unsupervised deep learning, combined with large computer-rendered stimulus sets, as a framework to understand how brains learn rich scene representations without ground-truth world information. Strikingly, the unsupervised models not only learn to represent physical properties such as shape, illumination, and material, but also suffer similar patterns of errors and ‘visual illusions’ as human observers do. I am excited by the possibility that unsupervised learning principles may account for a large number of perceptual dimensions in vision and beyond.