Learning Visually Grounded Intelligence With Language