Beyond Training Data: The Evolution of Model Extensibility

· msj's blog


Remember when chatbots felt like glorified decision trees? They knew what you were talking about, but ask them "Why is my bill so high?" and you'd hit a wall. That's because a model, by its very nature, only knows what it's been trained on. To answer real-world questions, it needs to reach out.

This challenge, extending a model's capabilities to interact with the external world for information, is a persistent one in AI. And watching its solutions evolve over the years has been fascinating. From tightly coupled, vendor-specific integrations to open, standardized protocols, we're finally seeing a future where AI can truly "plug and play."

Let's take a stroll down memory lane to see how model extensibility has matured.

The Early Days: Vendor-Locked & Language-Specific Back in my RASA days, building a chatbot that could access customer account information was a revelation. The solution, however, was deeply integrated with the RASA Python SDK. When a customer asked about their bill, RASA's core would trigger a "custom action." This action, defined in Python and served by a separate Flask or FastAPI server, would then fetch the necessary data and respond.

It worked! The model could "call an action" and get an answer, assuming it understood the user's intent. The downside? This was a Python-only, vendor-locked ecosystem. Great if you were committed to RASA, not so great if you ever wanted to switch frameworks. It felt a bit like the early days of CORBA or XML-RPC, powerful for their time, but often proprietary and complex.

A Step Forward: JSON Schema & Single-Vendor APIs Fast forward a few years, and the same problem resurfaced, but with a more polished solution: OpenAI Functions. This was a significant leap. Instead of being tied to a specific SDK, you could describe functions the model could call using standardized JSON Schema.

The concept was similar: the model decides when and how to call these functions based on the conversation. Your server, implemented in any language that supports JSON and REST, would then execute the actual function. This was a definite improvement. You gained language flexibility and a more universal way to describe capabilities. However, it still largely operated within the confines of a single vendor's ecosystem (OpenAI). Better, but still not truly open.

The Future is Now: The Model Context Protocol (MCP) This brings us to the exciting development of the Model Context Protocol (MCP). Imagine a "USB standard for AI tools." That's essentially what MCP aims to be.

Instead of ad-hoc function definitions within prompts or vendor-specific API calls, MCP defines a client-server architecture where tools expose their capabilities using a standardized protocol. This is a game-changer for interoperability.

Here's why MCP is so compelling:

From tightly coupled, vendor-specific integrations to a "USB standard for AI tools," the evolution of model extensibility is a testament to the AI community's drive for open, interoperable, and truly intelligent systems. MCP represents a significant stride towards a future where AI models can seamlessly connect to the vast ocean of external data and tools, unlocking unprecedented capabilities.

last updated: