Skip to content

xtrllm - Extract Structured Data using LLMs

A lightweight Python framework for portable, versioned, reusable LLM extraction tasks.

xtrllm separates two things every other library conflates:

  • The engine β€” prompt β†’ structured output β†’ log (stable, ships with the package)
  • The task β€” schema + prompt strategy + edge case handling (yours, lives in your repo)

πŸ› οΈ Installation

Install the latest stable release directly from PyPI using pip:

pip install xtrllm

πŸ—œοΈ Basic Usage

xtrllm tasks can be defined in one Python file, then loaded and run through an LLMXtractor, with the output validated against the schema before it is returned.

# eulex/tasks/classify_actor.py
from pydantic import BaseModel
from xtrllm.core.base import BaseTask

class ActorLabelSchema(BaseModel):
    value: str

class ClassifyActorTask(BaseTask):
    name = "classify_actor"
    schema = ActorLabelSchema
    system_prompt = "Classify legal and institutional actors."

    def build_prompt(self, text: str) -> str:
        return f"Classify this actor: {text}"

Once this is set up you can easily run it:

import xtrllm
from xtrllm import load_tasks, LLMXtractor

load_tasks("eulex/tasks", namespace="eulex")

extractor = LLMXtractor(task="eulex/classify_actor", model="gpt-4.1-mini")
result = extractor("European Commission")

print(result.value)

πŸŽ“ Citation

If you use this framework in academic research, please cite:

@misc{mandujano2026xtrllm,
  author       = {Mauricio Mandujano ManrΓ­quez},
  title        = {`xtrllm` - Extract Structured Data using LLMs},
  year         = {2026},
  howpublished = {\url{https://github.com/mauriciomm7/xtrllm}},
  note         = {GitHub repository}
}

πŸ™ Acknowledgments

This project stands on the shoulders of excellent open-source tools and services:

  • 🎨 Extraction Logo
    Icon made by Freepik from Flaticon.

  • βš™οΈ CI/CD Automation β€” GitHub Actions
    Powered by GitHub Actions, enabling automated build, test, and deployment pipelines directly within the GitHub ecosystem.

  • πŸ€– LLM CLI & Python Library
    llm by Simon Willison β€” A CLI tool and Python library for interacting with OpenAI, Anthropic Claude, Google Gemini, Meta Llama, and local language model APIs.

  • 🧩 Pydantic
    Built with Pydantic β€” Data validation and settings management using Python type annotations, enabling robust schema enforcement and structured data handling.

πŸ“„ License

This project is licensed under the MIT License.