hub / github.com/dottxt-ai/outlines

github.com/dottxt-ai/outlines @1.3.1 sqlite

repository ↗ · DeepWiki ↗ · release 1.3.1 ↗

1,873 symbols 6,994 edges 122 files 454 documented · 24%

README

🗒️ Structured outputs for LLMs 🗒️

Made with ❤👷️ by the team at .txt

Trusted by NVIDIA, Cohere, HuggingFace, vLLM, etc.

[![PyPI Version][pypi-version-badge]][pypi] [![Downloads][downloads-badge]][pypistats] [![Stars][stars-badge]][stars]

[![Discord][discord-badge]][discord] [![Blog][dottxt-blog-badge]][dottxt-blog] [![Twitter][twitter-badge]][twitter]

The .txt API is currently in early access. Request access here →

🚀 Building the future of structured generation

We're working with select partners to develop new interfaces to structured generation.

Need XML, FHIR, custom schemas or grammars? Let's talk.

Audit your schema: share one schema, we show you what breaks under generation, the constraints that fix it, and compliance rates before and after. Sign up here.

Why Outlines?
Quickstart
Real-World Examples
🙋‍♂️ Customer Support Triage
📦 E-commerce Product Categorization
📊 Parse Event Details with Incomplete Data
🗂️ Categorize Documents into Predefined Types
📅 Schedule a Meeting with Function Calling
📝 Dynamically Generate Prompts with Re-usable Templates
They Use Outlines
Model Integrations
Core Features
Other Features
About .txt
Community

Why Outlines?

LLMs are powerful but their outputs are unpredictable. Most solutions attempt to fix bad outputs after generation using parsing, regex, or fragile code that breaks easily.

Outlines guarantees structured outputs during generation — directly from any LLM.

Works with any model - Same code runs across OpenAI, Ollama, vLLM, and more
Simple integration - Just pass your desired output type: model(prompt, output_type)
Guaranteed valid structure - No more parsing headaches or broken JSON
Provider independence - Switch models without changing code

The Outlines Philosophy

Outlines follows a simple pattern that mirrors Python's own type system. Simply specify the desired output type, and Outlines will ensure your data matches that structure exactly:

For a yes/no response, use Literal["Yes", "No"]
For numerical values, use int
For complex objects, define a structure with a Pydantic model

Quickstart

Getting started with outlines is simple:

1. Install outlines

pip install outlines

2. Connect to your preferred model

import outlines
from transformers import AutoTokenizer, AutoModelForCausalLM


MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
model = outlines.from_transformers(
    AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto"),
    AutoTokenizer.from_pretrained(MODEL_NAME)
)

3. Start with simple structured outputs

from typing import Literal
from pydantic import BaseModel


# Simple classification
sentiment = model(
    "Analyze: 'This product completely changed my life!'",
    Literal["Positive", "Negative", "Neutral"]
)
print(sentiment)  # "Positive"

# Extract specific types
temperature = model("What's the boiling point of water in Celsius?", int)
print(temperature)  # 100

4. Create complex structures

from pydantic import BaseModel
from enum import Enum

class Rating(Enum):
    poor = 1
    fair = 2
    good = 3
    excellent = 4

class ProductReview(BaseModel):
    rating: Rating
    pros: list[str]
    cons: list[str]
    summary: str

review = model(
    "Review: The XPS 13 has great battery life and a stunning display, but it runs hot and the webcam is poor quality.",
    ProductReview,
    max_new_tokens=200,
)

review = ProductReview.model_validate_json(review)
print(f"Rating: {review.rating.name}")  # "Rating: good"
print(f"Pros: {review.pros}")           # "Pros: ['great battery life', 'stunning display']"
print(f"Summary: {review.summary}")     # "Summary: Good laptop with great display but thermal issues"

Real-world examples

Here are production-ready examples showing how Outlines solves common problems:

🙋‍♂️ Customer Support Triage

This example shows how to convert a free-form customer email into a structured service ticket. By parsing attributes like priority, category, and escalation flags, the code enables automated routing and handling of support issues.

import outlines
from enum import Enum
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM
from typing import List


MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
model = outlines.from_transformers(
    AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto"),
    AutoTokenizer.from_pretrained(MODEL_NAME)
)


def alert_manager(ticket):
    print("Alert!", ticket)


class TicketPriority(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    urgent = "urgent"

class ServiceTicket(BaseModel):
    priority: TicketPriority
    category: str
    requires_manager: bool
    summary: str
    action_items: List[str]


customer_email = """
Subject: URGENT - Cannot access my account after payment

I paid for the premium plan 3 hours ago and still can't access any features.
I've tried logging out and back in multiple times. This is unacceptable as I
have a client presentation in an hour and need the analytics dashboard.
Please fix this immediately or refund my payment.
"""

prompt = f"""
<|im_start|>user
Analyze this customer email:

{customer_email}
<|im_end|>
<|im_start|>assistant
"""

ticket = model(
    prompt,
    ServiceTicket,
    max_new_tokens=500
)

# Use structured data to route the ticket
ticket = ServiceTicket.model_validate_json(ticket)
if ticket.priority == "urgent" or ticket.requires_manager:
    alert_manager(ticket)

📦 E-commerce product categorization

This use case demonstrates how outlines can transform product descriptions into structured categorization data (e.g., main category, sub-category, and attributes) to streamline tasks such as inventory management. Each product description is processed automatically, reducing manual categorization overhead.

import outlines
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM
from typing import List, Optional


MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
model = outlines.from_transformers(
    AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto"),
    AutoTokenizer.from_pretrained(MODEL_NAME)
)


def update_inventory(product, category, sub_category):
    print(f"Updated {product.split(',')[0]} in category {category}/{sub_category}")


class ProductCategory(BaseModel):
    main_category: str
    sub_category: str
    attributes: List[str]
    brand_match: Optional[str]

# Process product descriptions in batches
product_descriptions = [
    "Apple iPhone 15 Pro Max 256GB Titanium, 6.7-inch Super Retina XDR display with ProMotion",
    "Organic Cotton T-Shirt, Men's Medium, Navy Blue, 100% Sustainable Materials",
    "KitchenAid Stand Mixer, 5 Quart, Red, 10-Speed Settings with Dough Hook Attachment"
]

template = outlines.Template.from_string("""
<|im_start|>user
Categorize this product:

{{ description }}
<|im_end|>
<|im_start|>assistant
""")

# Get structured categorization for all products
categories = model(
    [template(description=desc) for desc in product_descriptions],
    ProductCategory,
    max_new_tokens=200
)

# Use categorization for inventory management
categories = [
    ProductCategory.model_validate_json(category) for category in categories
]
for product, category in zip(product_descriptions, categories):
    update_inventory(product, category.main_category, category.sub_category)

📊 Parse event details with incomplete data

This example uses outlines to parse event descriptions into structured information (like event name, date, location, type, and topics), even handling cases where the data is incomplete. It leverages union types to return either structured event data or a fallback “I don’t know” answer, ensuring robust extraction in varying scenarios.

import outlines
from typing import Union, List, Literal
from pydantic import BaseModel
from enum import Enum
from transformers import AutoTokenizer, AutoModelForCausalLM


MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
model = outlines.from_transformers(
    AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto"),
    AutoTokenizer.from_pretrained(MODEL_NAME)
)

class EventType(str, Enum):
    conference = "conference"
    webinar = "webinar"
    workshop = "workshop"
    meetup = "meetup"
    other = "other"


class EventInfo(BaseModel):
    """Structured information about a tech event"""
    name: str
    date: str
    location: str
    event_type: EventType
    topics: List[str]
    registration_required: bool

# Create a union type that can either be a structured EventInfo or "I don't know"
EventResponse = Union[EventInfo, Literal["I don't know"]]

# Sample event descriptions
event_descriptions = [
    # Complete information
    """
    Join us for DevCon 2023, the premier developer conference happening on November 15-17, 2023
    at the San Francisco Convention Center. Topics include AI/ML, cloud infrastructure, and web3.
    Registration is required.
    """,

    # Insufficient information
    """
    Tech event next week. More details coming soon!
    """
]

# Process events
results = []
for description in event_descriptions:
    prompt = f"""
<|im_start>system
You are a helpful assistant
<|im_end|>
<|im_start>user
Extract structured information about this tech event:

{description}

If there is enough information, return a JSON object with the following fields:

- name: The name of the event
- date: The date where the event is taking place
- location: Where the event is taking place
- event_type: either 'conference', 'webinar', 'workshop', 'meetup' or 'other'
- topics: a list of topics of the conference
- registration_required: a boolean that indicates whether registration is required

If the information available does not allow you to fill this JSON, and only then, answer 'I don't know'.
<|im_end|>
<|im_start|>assistant
"""
    # Union type allows the model to return structured data or "I don't know"
    result = model(prompt, EventResponse, max_new_tokens=200)
    results.append(result)

# Display results
for i, result in enumerate(results):
    print(f"Event {i+1}:")
    if isinstance(result, str):
        print(f"  {result}")
    else:
        # It's an EventInfo object
        print(f"  Name: {result.name}")
        print(f"  Type: {result.event_type}")
        print(f"  Date: {result.date}")
        print(f"  Topics: {', '.join(result.topics)}")
    print()

# Use structured data in downstream processing
structured_count = sum(1 for r in results if isinstance(r, EventInfo))
print(f"Successfully extracted data for {structured_count} of {len(results)} events")

🗂️ Categorize documents into predefined types

In this case, outlines classifies documents into predefined categories (e.g., “Financial Report,” “Legal Contract”) using a literal type specification. The resulting classifications are displayed in both a table format and through a category distribution summary, illustrating how structured outputs can simplify content management.

```python import outlines from typing import Literal, List import pandas as pd from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct" model = outlines.from_transformers( AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto"), AutoTokenizer.from_pretrained(MODEL_NAME) )

Define classification categories using Literal

DocumentCategory = Literal[ "Financial Report", "Legal Contract", "Technical Documentation", "Marketing Material", "Personal Correspondence" ]

Sample documents to classify

documents = [ "Q3 Financial Summary: Revenue increased by 15% year-over-year to $12.4M. EBITDA margin improved to 23% compared to 19% in Q3 last year. Operating expenses...",

"This agreement is made between Party A and Party B, hereinafter referred to as 'the Parties', on this day of...",

"The API accepts POST requests with JSON payloads. Required parameters include 'user_id' and 'transaction_type'. The endpoint returns a 200 status code on success."

]

template = outlines.Template.from_string(""" <|im_start|>user Classify the following document into exactly one category among the following categories: - Financial Report - Legal Contract - Technical Documentation - Ma

Core symbols most depended-on inside this repo

format_output_type

called by 105

src/outlines/models/tgi.py

format_input

called by 90

src/outlines/models/tgi.py

python_types_to_terms

called by 59

src/outlines/types/dsl.py

to_regex

called by 53

src/outlines/types/dsl.py

get

called by 51

src/outlines/caching.py

append

called by 45

src/outlines/inputs.py

batch

called by 37

src/outlines/models/base.py

normalize_provider_errors

called by 36

src/outlines/exceptions.py

Shape

Function 983

Method 622

Class 226

Route 42

Languages

Python100%

Modules by API surface

tests/test_exceptions.py173 symbols

src/outlines/types/dsl.py100 symbols

tests/types/test_dsl.py69 symbols

tests/models/test_ollama.py49 symbols

tests/types/test_types_utils.py47 symbols

tests/models/test_provider_exceptions.py42 symbols

tests/models/test_mistral.py40 symbols

src/outlines/models/transformers.py40 symbols

tests/models/test_openai.py38 symbols

tests/test_utils/mock_lmstudio_client.py29 symbols

tests/models/test_lmstudio.py28 symbols

tests/models/test_gemini.py27 symbols

Dependencies from manifests, versioned

accelerate0.27.2 · 1×

bentoml1.2.11 · 1×

cloudpickle1×

datasets2.18.0 · 1×

diskcache1×

genson1×

jinja21×

jsonpath_ng1×

jsonschema1×

outlines0.0.37 · 1×

outlines_core0.2.14 · 1×

pillow1×

For agents

$ claude mcp add outlines \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact