xtrllm - Extract Structured Data using LLMs
A lightweight Python framework for portable, versioned, reusable LLM extraction tasks.
xtrllm separates two things every other library conflates:
- The engine β prompt β structured output β log (stable, ships with the package)
- The task β schema + prompt strategy + edge case handling (yours, lives in your repo)
π οΈ Installation
Install the latest stable release directly from PyPI using pip:
ποΈ Basic Usage
xtrllm tasks can be defined in one Python file, then loaded and run through an LLMXtractor, with the output validated against the schema before it is returned.
# eulex/tasks/classify_actor.py
from pydantic import BaseModel
from xtrllm.core.base import BaseTask
class ActorLabelSchema(BaseModel):
value: str
class ClassifyActorTask(BaseTask):
name = "classify_actor"
schema = ActorLabelSchema
system_prompt = "Classify legal and institutional actors."
def build_prompt(self, text: str) -> str:
return f"Classify this actor: {text}"
Once this is set up you can easily run it:
import xtrllm
from xtrllm import load_tasks, LLMXtractor
load_tasks("eulex/tasks", namespace="eulex")
extractor = LLMXtractor(task="eulex/classify_actor", model="gpt-4.1-mini")
result = extractor("European Commission")
print(result.value)
π Citation
If you use this framework in academic research, please cite:
@misc{mandujano2026xtrllm,
author = {Mauricio Mandujano ManrΓquez},
title = {`xtrllm` - Extract Structured Data using LLMs},
year = {2026},
howpublished = {\url{https://github.com/mauriciomm7/xtrllm}},
note = {GitHub repository}
}
π Acknowledgments
This project stands on the shoulders of excellent open-source tools and services:
-
βοΈ CI/CD Automation β GitHub Actions
Powered by GitHub Actions, enabling automated build, test, and deployment pipelines directly within the GitHub ecosystem. -
π€ LLM CLI & Python Library
llmby Simon Willison β A CLI tool and Python library for interacting with OpenAI, Anthropic Claude, Google Gemini, Meta Llama, and local language model APIs. -
π§© Pydantic
Built with Pydantic β Data validation and settings management using Python type annotations, enabling robust schema enforcement and structured data handling.
π License
This project is licensed under the MIT License.