What are Pre-training Methods of Vision Language Models?