You can now deploy any ML model, RAG, or Agent as an MCP server. And it takes just 10 lines of code. Here's a breakdown, with code (100% private):
12
29
223
Connecting AI models to different apps usually means writing custom code for each one. For instance, if you want to use a model in a Slack bot or in a dashboard, you'd typically need to write separate integration code for each app. Let's learn how to simplify this via MCPs. We’ll use @LightningAI's LitServe, a popular open-source serving engine for AI models built on FastAPI. It integrates MCP via a dedicated /mcp endpoint. This means that any AI model, RAG, or agent can be deployed as an MCP server, accessible by any MCP client. Here’s the code: - InputRequest defines the input schema. - setup defines the model, Agent, RAG, etc. - decode_request prepares the input - predict runs the inference logic - encode_response sends the response back - main guard runs the LitServe MCP API Next, run the above script (uv run server[.]py) to have the model available as an MCP server. Finally, add the following config to Claude Desktop: With that, you will have your model available as an MCP server in Claude Desktop. The above screenshot shows that the model is available as an MCP server in Claude Desktop. The image below shows an interaction. I like LitServe because: - It’s 2x faster than FastAPI. - It gives full control over inference. - We can serve any model (LLM, vision, audio, multimodal). - We can compose agents, RAG & pipelines in one file.

Nov 4, 2025 · 6:31 AM UTC

1
2
13