Skip to the content
-
External Tags
api server, bakllava, batching, chat/completions, CLIP, cuda 11.8, deep learning, fastapi, gpu memory, image+text, langchain, llava, Model Deployment, multimodal inference, offline inference, openai-compatible api, pagedattention, python, streamlit, Tutorial, Vicuna, vllm