Salesforce BLIP: Revolutionizing Image Captioning

Vision Transformers (ViT) in Image Captioning Using Pretrained ViT Models