Introduction to Custom LLM Providers

Welcome back, intrepid data explorer! In previous chapters, we’ve seen how LangExtract brilliantly orchestrates Large Language Models (LLMs) to extract structured information from unstructured text. We’ve used its default integrations, which are fantastic for getting started. But what if your needs are a bit more unique?

Perhaps you’re working with a highly specialized, fine-tuned LLM running on your company’s private cloud. Maybe you want to experiment with a bleeding-edge open-source model that just got released on Hugging Face, or you need to integrate with a less common commercial LLM API. This is where the power of LangExtract’s custom LLM provider interface shines!

In this chapter, we’ll dive deep into extending LangExtract’s capabilities by integrating your very own custom LLM providers. You’ll learn the underlying architecture, how to write a custom provider, and how to seamlessly plug it into your extraction workflows. By the end, you’ll have the confidence to connect LangExtract to virtually any LLM you can imagine, giving you unparalleled flexibility and control over your data extraction pipelines.

To get the most out of this chapter, you should be comfortable with:

  • Basic LangExtract installation and usage (pip install langextract).
  • Defining extraction schemas.
  • Understanding how to use the lx.extract function.

Let’s unlock a new level of customization!

The Power of Custom LLM Integrations

LangExtract is designed with modularity in mind. It doesn’t hardcode specific LLMs but rather uses an abstraction layer that allows different “providers” to be swapped in. This is a powerful concept, let’s explore why:

Why Bother with Custom Providers?

  1. Flexibility and Choice: The LLM landscape is constantly evolving. New models emerge, and existing ones get updated. Custom providers ensure you’re not locked into a specific set of models or APIs.
  2. Cost Optimization: You might have access to cheaper LLM APIs, or perhaps you want to leverage open-source models that can be run on your own infrastructure, potentially saving significant costs.
  3. Data Privacy and Security: For sensitive data, running LLMs locally or on a private, compliant cloud environment might be a strict requirement. Custom providers make this possible.
  4. Performance and Specialization: You might have a fine-tuned LLM that excels at your specific extraction task. Integrating this specialized model can lead to much higher accuracy and efficiency than general-purpose models.
  5. Offline Capabilities: For environments with limited or no internet access, custom providers can enable LangExtract to work with locally deployed models.

LangExtract’s Provider Interface: The LLMProvider Class

At the heart of custom LLM integration is the langextract.llm.LLMProvider abstract base class. This class defines the blueprint for any LLM provider that LangExtract can use. When you create a custom provider, you’ll inherit from this class and implement its core logic.

Think of it like this: LangExtract knows what it needs from an LLM (take a prompt, return a response), but it doesn’t care how that response is generated. The LLMProvider is the “how.”

graph TD A["Your LangExtract Code"] --> B{"lx.extract(..., llm_provider=...)"} B --> C["LangExtract Core Logic"] C --> D["LLMProvider Interface"] D --> E["Your Custom LLM Provider"] E --> F["Your Chosen LLM"] F --> E E --> D D --> C C --> B B --> A

Figure 13.1: Flow of a Custom LLM Provider in LangExtract

As you can see, your custom provider acts as the bridge between LangExtract’s needs and your chosen LLM.

Step-by-Step Implementation: Building a Custom LLM Provider

Let’s roll up our sleeves and build a custom LLM provider. We’ll start with a “mock” provider to understand the structure, then move towards a more realistic (though simplified) integration with a hypothetical external API.

Step 1: Understanding the LLMProvider Base Class

The LLMProvider class requires you to implement a crucial method: _call_llm(self, prompt: str, max_tokens: int, temperature: float) -> str.

  • self: The instance of your provider.
  • prompt: The carefully crafted text prompt that LangExtract generates for the LLM based on your schema and input.
  • max_tokens: The maximum number of tokens the LLM should generate for its response. This helps control response length and cost.
  • temperature: A parameter controlling the randomness or creativity of the LLM’s output. Higher values mean more random.
  • -> str: The method must return a string. LangExtract expects this string to be a JSON object containing the extracted data, even if it’s just a placeholder for now.

Let’s begin by importing what we need:

# custom_providers.py (or directly in your script)
import json
import os
import logging
from typing import Dict, Any

from langextract.llm import LLMProvider
from langextract import extract, Schema, Field

# Configure basic logging for better visibility
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

Explanation:

  • json, os, logging: Standard Python libraries we might need for handling JSON, environment variables, or debugging.
  • LLMProvider from langextract.llm: This is the base class we’ll inherit from.
  • extract, Schema, Field from langextract: These are core LangExtract components we’ll use to test our provider.
  • logging.basicConfig: A simple setup to see log messages, which will be helpful for debugging our custom provider.

Step 2: Creating a Simple Mock LLM Provider

To grasp the mechanics without external dependencies, let’s create a MockLLMProvider. This provider won’t call any real LLM; instead, it will return a predefined JSON string. This is invaluable for testing your LangExtract setup without incurring API costs or waiting for real LLM responses.

# custom_providers.py (continued)

class MockLLMProvider(LLMProvider):
    """
    A mock LLM provider that returns a fixed JSON string.
    Useful for testing LangExtract's workflow without a real LLM.
    """
    def __init__(self, mock_response: Dict[str, Any]):
        super().__init__()
        self.mock_response = json.dumps(mock_response)
        logging.info("MockLLMProvider initialized with fixed response.")

    def _call_llm(self, prompt: str, max_tokens: int, temperature: float) -> str:
        """
        Simulates an LLM call by returning the predefined mock response.
        """
        logging.info(f"MockLLMProvider received prompt (truncated): {prompt[:200]}...")
        logging.info(f"MockLLMProvider returning fixed response.")
        return self.mock_response

# Let's define a schema to test our mock provider
movie_schema = Schema(
    name="MovieInfo",
    description="Extract details about a movie.",
    fields=[
        Field(name="title", type=str, description="The title of the movie."),
        Field(name="director", type=str, description="The director of the movie."),
        Field(name="year", type=int, description="The release year of the movie."),
    ]
)

# Example usage with the MockLLMProvider
if __name__ == "__main__":
    text_to_extract = "The movie 'Inception' was directed by Christopher Nolan and released in 2010."

    # Define the mock response that matches our schema
    mock_data = {
        "title": "Inception",
        "director": "Christopher Nolan",
        "year": 2010
    }
    mock_llm = MockLLMProvider(mock_response=mock_data)

    logging.info("\n--- Testing MockLLMProvider ---")
    extracted_data = extract(
        text=text_to_extract,
        schema=movie_schema,
        llm_provider=mock_llm # <-- THIS is how you pass your custom provider!
    )

    print("\nExtracted Data (Mock):")
    print(extracted_data.to_dict())
    print(f"Extraction successful: {extracted_data.successful}")

Explanation:

  1. MockLLMProvider(LLMProvider): We define our class and inherit from LLMProvider. This is crucial.
  2. __init__(self, mock_response): The constructor takes a dictionary (mock_response) which will be our fixed output. We store its JSON string representation. We also call super().__init__() to properly initialize the base class.
  3. _call_llm(...): This is where the magic (or lack thereof, for a mock) happens. It simply returns self.mock_response. Notice we log the incoming prompt; this is a great debugging technique to see what LangExtract is sending to your LLM.
  4. extract(..., llm_provider=mock_llm): This is the key line! When calling extract, you pass an instance of your custom provider to the llm_provider argument. LangExtract will then use your provider instead of its default.

Run this script. You should see the logs indicating the prompt being received by the MockLLMProvider and then the mock data being “extracted” by LangExtract. This confirms your custom provider is correctly integrated!

Step 3: Integrating with a Hypothetical External LLM API

Now, let’s get a bit more realistic. Imagine you have access to a custom LLM endpoint (e.g., a locally hosted model, or a proprietary API) that you interact with via a simple HTTP request. For this example, we’ll simulate this with a placeholder CustomAPIClient.

Important Note on LLM API Integration (2026-01-05): Most commercial and open-source LLMs (like OpenAI, Anthropic, Google Gemini, Hugging Face models via TGI or vLLM) provide Python SDKs or well-defined REST APIs. When implementing a real custom provider, you would use these SDKs or requests library to interact with your chosen LLM. For instance:

  • OpenAI: openai.chat.completions.create(...)
  • Anthropic: anthropic.messages.create(...)
  • Hugging Face (via transformers or inference-sdk): Load model, tokenizer, and generate. The _call_llm method would encapsulate this specific API call.

Let’s create a CustomAPIClient placeholder and then integrate it.

# custom_providers.py (continued)
import requests # We'll pretend to use this for our hypothetical API

class CustomAPIClient:
    """
    A hypothetical client for a custom LLM API.
    In a real scenario, this would handle authentication, network calls, etc.
    """
    def __init__(self, api_key: str, api_endpoint: str):
        self.api_key = api_key
        self.api_endpoint = api_endpoint
        logging.info(f"CustomAPIClient initialized for endpoint: {api_endpoint}")

    def generate_text(self, prompt: str, max_tokens: int, temperature: float) -> str:
        """
        Simulates making an API call to a custom LLM endpoint.
        Returns a JSON string as expected by LangExtract.
        """
        logging.info(f"CustomAPIClient sending request to {self.api_endpoint}...")
        logging.debug(f"Prompt sent to custom API (truncated): {prompt[:500]}...")
        
        # --- In a real scenario, you'd make an actual HTTP request here ---
        # try:
        #     headers = {"Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json"}
        #     payload = {
        #         "prompt": prompt,
        #         "max_tokens": max_tokens,
        #         "temperature": temperature,
        #         # Add other LLM-specific parameters
        #     }
        #     response = requests.post(self.api_endpoint, headers=headers, json=payload, timeout=60)
        #     response.raise_for_status() # Raise an exception for HTTP errors
        #     llm_output = response.json().get("generated_text", "{}") # Adjust based on actual API response structure
        #     return llm_output
        # except requests.exceptions.RequestException as e:
        #     logging.error(f"Error calling custom LLM API: {e}")
        #     return json.dumps({"error": f"API call failed: {e}"})
        # -----------------------------------------------------------------
        
        # For demonstration, we'll return a hardcoded response like our mock
        # In a real scenario, the LLM would interpret the prompt and return actual JSON.
        # We'll simulate a successful extraction for a different movie.
        simulated_llm_response = {
            "title": "The Matrix",
            "director": "The Wachowskis",
            "year": 1999
        }
        return json.dumps(simulated_llm_response)


class MyCustomLLMProvider(LLMProvider):
    """
    An LLM provider that integrates with our hypothetical CustomAPIClient.
    """
    def __init__(self, api_key: str, api_endpoint: str):
        super().__init__()
        self.client = CustomAPIClient(api_key, api_endpoint)
        logging.info("MyCustomLLMProvider initialized.")

    def _call_llm(self, prompt: str, max_tokens: int, temperature: float) -> str:
        """
        Calls the custom API client to generate text.
        """
        return self.client.generate_text(prompt, max_tokens, temperature)

# Example usage with MyCustomLLMProvider
if __name__ == "__main__":
    # Ensure you set these environment variables or replace them with actual values
    # For this example, they are placeholders as CustomAPIClient is mocked.
    os.environ["CUSTOM_LLM_API_KEY"] = os.getenv("CUSTOM_LLM_API_KEY", "your_api_key_here")
    os.environ["CUSTOM_LLM_ENDPOINT"] = os.getenv("CUSTOM_LLM_ENDPOINT", "https://api.mycustomllm.com/v1/generate")

    text_to_extract_2 = "Tell me about the classic film 'The Matrix' from 1999."

    # Instantiate our custom provider
    custom_llm_provider = MyCustomLLMProvider(
        api_key=os.environ["CUSTOM_LLM_API_KEY"],
        api_endpoint=os.environ["CUSTOM_LLM_ENDPOINT"]
    )

    logging.info("\n--- Testing MyCustomLLMProvider ---")
    extracted_data_2 = extract(
        text=text_to_extract_2,
        schema=movie_schema, # Using the same movie_schema
        llm_provider=custom_llm_provider,
        max_tokens=200,      # Pass these parameters to LangExtract, which then passes to _call_llm
        temperature=0.7
    )

    print("\nExtracted Data (Custom API):")
    print(extracted_data_2.to_dict())
    print(f"Extraction successful: {extracted_data_2.successful}")

    # You can also run the mock test here again if you uncomment the first part
    # text_to_extract = "The movie 'Inception' was directed by Christopher Nolan and released in 2010."
    # mock_data = {"title": "Inception", "director": "Christopher Nolan", "year": 2010}
    # mock_llm = MockLLMProvider(mock_response=mock_data)
    # logging.info("\n--- Testing MockLLMProvider Again ---")
    # extracted_data = extract(text=text_to_extract, schema=movie_schema, llm_provider=mock_llm)
    # print("\nExtracted Data (Mock):")
    # print(extracted_data.to_dict())

Explanation:

  1. CustomAPIClient: This class represents your interaction layer with the actual LLM API. It holds the api_key and api_endpoint. Its generate_text method would contain the requests.post call to the real API. For our example, it still returns a hardcoded JSON string to keep the focus on the LangExtract integration rather than complex API handling.
  2. MyCustomLLMProvider(LLMProvider): This is our actual custom provider.
    • In its __init__, it creates an instance of CustomAPIClient, passing any necessary credentials or configuration.
    • Its _call_llm method simply delegates the actual LLM call to self.client.generate_text, ensuring that the prompt, max_tokens, and temperature parameters are passed along.
  3. Environment Variables: It’s a best practice to handle sensitive information like API keys using environment variables. We use os.getenv with a default, but in production, these should be properly set.
  4. extract(..., llm_provider=custom_llm_provider, max_tokens=..., temperature=...): Again, we pass our custom provider instance. Notice that max_tokens and temperature are passed directly to extract. LangExtract intelligently forwards these values to your _call_llm method.

Running this script will demonstrate how MyCustomLLMProvider is used, and you’ll see the logs from both MyCustomLLMProvider and CustomAPIClient, even though the LLM response itself is simulated.

Version Information and Best Practices (2026-01-05)

As of early 2026, langextract is an actively developed library by Google. While specific minor versions might change rapidly, the core LLMProvider interface is designed for stability.

To verify your langextract version:

pip show langextract

This will display details about your installed version. Always refer to the official LangExtract GitHub repository for the most up-to-date documentation on the LLMProvider interface and any new features or deprecations.

Modern Best Practices for Custom Providers:

  • Error Handling: Implement robust try-except blocks within your _call_llm method to catch network errors, API errors, or malformed responses from your LLM. LangExtract can handle exceptions raised by _call_llm, but a graceful fallback (e.g., returning an empty or error-indicating JSON) is often better.
  • Asynchronous Calls: For high-throughput scenarios, consider making your _call_llm method asynchronous if your underlying LLM client supports it. LangExtract itself can often handle parallel processing of chunks, but individual LLM calls can also be optimized.
  • Configuration: Design your provider’s __init__ method to accept necessary configurations (API keys, endpoints, model names, timeouts) either directly or from environment variables, making it flexible and secure.
  • Logging: Use Python’s logging module extensively within your provider to trace prompts, responses, and errors. This is invaluable for debugging.

Mini-Challenge: Adding a Fallback Mechanism

It’s rare for an LLM API call to never fail. A good custom provider should be resilient.

Challenge: Modify MyCustomLLMProvider to include a simple retry mechanism or a fallback to a default response if the CustomAPIClient.generate_text method (which, remember, would be a real API call in production) encounters an error. For this challenge, simulate an error by randomly failing 20% of the time in CustomAPIClient.generate_text. If it fails, return a default “error” JSON.

Hint:

  1. Inside CustomAPIClient.generate_text, use random.random() < 0.2 to simulate a failure.
  2. If it “fails,” log an error and return a JSON string like {"error": "LLM API failed", "details": "simulated error"}.
  3. The _call_llm method in MyCustomLLMProvider should then handle this (or simply pass it through, but the challenge is to make CustomAPIClient robust).

What to Observe/Learn: You’ll see how LangExtract processes the (potentially erroneous) JSON returned by your provider. This highlights the importance of your provider returning valid JSON, even for error states, so LangExtract can still process it.

Common Pitfalls & Troubleshooting

Even with careful planning, things can go wrong. Here are some common issues when building custom LLM providers and how to address them:

  1. Incorrect _call_llm Return Type:

    • Pitfall: Your _call_llm method does not return a str, or the string it returns is not valid JSON. LangExtract expects a JSON string.
    • Troubleshooting: Always ensure your LLM’s raw output is converted to a JSON string using json.dumps() before returning it. If the LLM returns plain text, you’ll need to parse it into a dictionary first. Check the extracted_data.successful flag and extracted_data.errors list in the ExtractionResult object for clues.
  2. API Key/Endpoint Misconfiguration:

    • Pitfall: Incorrect API keys, wrong endpoint URLs, or network issues preventing your CustomAPIClient from connecting to the actual LLM.
    • Troubleshooting:
      • Double-check environment variables and hardcoded values.
      • Test your CustomAPIClient (or the underlying LLM SDK) independently of LangExtract to ensure it can make successful calls.
      • Use logging.DEBUG in your provider to print out the exact URL and headers being sent.
      • Check network connectivity and firewall rules.
  3. Prompt Formatting Differences:

    • Pitfall: LangExtract generates prompts in a specific format (e.g., including instructions, schema, and text). Your custom LLM might be sensitive to this format, or it might require additional system messages or specific delimiters.
    • Troubleshooting: Log the prompt argument received by your _call_llm method. Examine it carefully. If your LLM requires a different format, you might need to preprocess the prompt string within _call_llm before sending it to your LLM. However, generally, LangExtract’s prompts are designed to be quite robust for instruction-following LLMs.
  4. Dependency Issues:

    • Pitfall: Your custom provider (e.g., CustomAPIClient) requires external libraries (like requests, openai, anthropic, transformers) that are not installed in your environment.
    • Troubleshooting: Always install all necessary dependencies for your custom provider using pip. For example, pip install requests openai.

By systematically checking these areas, you can quickly diagnose and fix issues with your custom LLM integrations.

Summary

Phew! You’ve just unlocked a huge level of power and flexibility in LangExtract. Let’s recap what we’ve learned:

  • Why Custom Providers? They offer unmatched flexibility, cost savings, security, and performance benefits by allowing you to use any LLM with LangExtract.
  • The LLMProvider Interface: All custom providers must inherit from langextract.llm.LLMProvider and implement the _call_llm method.
  • _call_llm’s Role: This method is the bridge, taking LangExtract’s prompt and parameters, calling your chosen LLM, and returning the LLM’s response as a JSON string.
  • Incremental Development: We started with a simple mock provider to understand the structure, then moved to a more realistic (though simulated) integration with an external API.
  • Key Parameters: prompt, max_tokens, and temperature are passed to your _call_llm method, giving you control over the LLM’s generation.
  • Robustness: Implementing error handling and logging within your custom provider is crucial for reliable extraction.

You are now equipped to integrate LangExtract with virtually any LLM, giving you incredible control over your structured data extraction workflows. This opens up possibilities for using specialized models, optimizing costs, and meeting stringent compliance requirements.

What’s Next? In the next chapter, we’ll explore advanced strategies for handling extremely long documents, using techniques like multi-pass extraction and intelligent chunking, which often go hand-in-hand with optimizing your LLM calls, especially when using custom providers.


References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.