How Neural Networks Learn: A Probabilistic Viewpoint