hub / github.com/alecthomas/participle

github.com/alecthomas/participle @v2.1.4 sqlite

repository ↗ · DeepWiki ↗ · release v2.1.4 ↗

1,003 symbols 2,288 edges 82 files 192 documented · 19% 12 cross-repo links

README

A dead simple parser package for Go

V2
Introduction
Tutorial
Tag syntax
Overview
Grammar syntax
Capturing
- Capturing boolean value
"Union" types
Custom parsing
Lexing
Options
Examples
Performance
Concurrency
Error reporting
Comments
Limitations
EBNF
Syntax/Railroad Diagrams

V2

This is version 2 of Participle.

It can be installed with:

$ go get github.com/alecthomas/participle/v2@latest

The latest version from v0 can be installed via:

$ go get github.com/alecthomas/participle@latest

Introduction

The goal of this package is to provide a simple, idiomatic and elegant way of defining parsers in Go.

Participle's method of defining grammars should be familiar to any Go programmer who has used the encoding/json package: struct field tags define what and how input is mapped to those same fields. This is not unusual for Go encoders, but is unusual for a parser.

Tutorial

A tutorial is available, walking through the creation of an .ini parser.

Tag syntax

Participle supports two forms of struct tag grammar syntax.

The easiest to read is when the grammar uses the entire struct tag content, eg.

Field string `@Ident @("," Ident)*`

However, this does not coexist well with other tags such as JSON, etc. and may cause issues with linters. If this is an issue then you can use the parser:"" tag format. In this case single quotes can be used to quote literals making the tags somewhat easier to write, eg.

Field string `parser:"@ident (',' Ident)*" json:"field"`

Overview

A grammar is an annotated Go structure used to both define the parser grammar, and be the AST output by the parser. As an example, following is the final INI parser from the tutorial.

``go type INI struct { Properties []*Property@@Sections []*Section@@` }

type Section struct { Identifier string "[" @Ident "]" Properties []*Property @@* }

type Property struct { Key string @Ident "=" Value *Value @@ }

type Value struct { String string @String Float float64 | @Float Int *int | @Int } ```

Note: Participle also supports named struct tags (eg. Hello string `parser:"@Ident"`).

A parser is constructed from a grammar and a lexer:

parser, err := participle.Build[INI]()

Once constructed, the parser is applied to input to produce an AST:

ast, err := parser.ParseString("", "size = 10")
// ast == &INI{
//   Properties: []*Property{
//     {Key: "size", Value: &Value{Int: &10}},
//   },
// }

Grammar syntax

Participle grammars are defined as tagged Go structures. Participle will first look for tags in the form parser:"...". It will then fall back to using the entire tag body.

The grammar format is:

@<expr> Capture expression into the field.
@@ Recursively capture using the fields own type.
<identifier> Match named lexer token.
( ... ) Group.
"..." or '...' Match the literal (note that the lexer must emit tokens matching this literal exactly).
"...":<identifier> Match the literal, specifying the exact lexer token type to match.
<expr> <expr> ... Match expressions.
<expr> | <expr> | ... Match one of the alternatives. Each alternative is tried in order, with backtracking.
~<expr> Match any token that is not the start of the expression (eg: @~";" matches anything but the ; character into the field).
(?= ... ) Positive lookahead group - requires the contents to match further input, without consuming it.
(?! ... ) Negative lookahead group - requires the contents not to match further input, without consuming it.

The following modifiers can be used after any expression:

* Expression can match zero or more times.
+ Expression must match one or more times.
? Expression can match zero or once.
! Require a non-empty match (this is useful with a sequence of optional matches eg. ("a"? "b"? "c"?)!).

Notes:

Each struct is a single production, with each field applied in sequence.
@<expr> is the mechanism for capturing matches into the field.
if a struct field is not keyed with "parser", the entire struct tag will be used as the grammar fragment. This allows the grammar syntax to remain clear and simple to maintain.

Capturing

Prefixing any expression in the grammar with @ will capture matching values for that expression into the corresponding field.

For example:

// The grammar definition.
type Grammar struct {
  Hello string `@Ident`
}

// The source text to parse.
source := "world"

// After parsing, the resulting AST.
result == &Grammar{
  Hello: "world",
}

For slice and string fields, each instance of @ will accumulate into the field (including repeated patterns). Accumulation into other types is not supported.

For integer and floating point types, a successful capture will be parsed with strconv.ParseInt() and strconv.ParseFloat() respectively.

A successful capture match into a bool field will set the field to true.

Tokens can also be captured directly into fields of type lexer.Token and []lexer.Token.

Custom control of how values are captured into fields can be achieved by a field type implementing the Capture interface (Capture(values []string) error).

Additionally, any field implementing the encoding.TextUnmarshaler interface will be capturable too. One caveat is that UnmarshalText() will be called once for each captured token, so eg. @(Ident Ident Ident) will be called three times.

Capturing boolean value

By default, a boolean field is used to indicate that a match occurred, which turns out to be much more useful and common in Participle than parsing true or false literals. For example, parsing a variable declaration with a trailing optional syntax:

type Var struct {
  Name string `"var" @Ident`
  Type string `":" @Ident`
  Optional bool `@"?"?`
}

In practice this gives more useful ASTs. If bool were to be parsed literally then you'd need to have some alternate type for Optional such as string or a custom type.

To capture literal boolean values such as true or false, implement the Capture interface like so:

type Boolean bool

func (b *Boolean) Capture(values []string) error {
    *b = values[0] == "true"
    return nil
}

type Value struct {
    Float  *float64 `  @Float`
    Int    *int     `| @Int`
    String *string  `| @String`
    Bool   *Boolean `| @("true" | "false")`
}

"Union" types

A very common pattern in parsers is "union" types, an example of which is shown above in the Value type. A common way of expressing this in Go is via a sealed interface, with each member of the union implementing this interface.

eg. this is how the Value type could be expressed in this way:

type Value interface { value() }

type Float struct { Value float64 `@Float` }
func (f Float) value() {}

type Int struct { Value int `@Int` }
func (f Int) value() {}

type String struct { Value string `@String` }
func (f String) value() {}

type Bool struct { Value Boolean `@("true" | "false")` }
func (f Bool) value() {}

Thanks to the efforts of Jacob Ryan McCollum, Participle now supports this pattern. Simply construct your parser with the Union[T](member...T) option, eg.

parser := participle.MustBuild[AST](participle.Union[Value](Float{}, Int{}, String{}, Bool{}))

Custom parsers may also be defined for union types with the ParseTypeWith option.

Custom parsing

There are three ways of defining custom parsers for nodes in the grammar:

Implement the Capture interface.
Implement the Parseable interface.
Use the ParseTypeWith option to specify a custom parser for union interface types.

Lexing

Participle relies on distinct lexing and parsing phases. The lexer takes raw bytes and produces tokens which the parser consumes. The parser transforms these tokens into Go values.

The default lexer, if one is not explicitly configured, is based on the Go text/scanner package and thus produces tokens for C/Go-like source code. This is surprisingly useful, but if you do require more control over lexing the included stateful participle/lexer lexer should cover most other cases. If that in turn is not flexible enough, you can implement your own lexer.

Configure your parser with a lexer using the participle.Lexer() option.

To use your own Lexer you will need to implement two interfaces: Definition (and optionally StringsDefinition and BytesDefinition) and Lexer.

Stateful lexer

In addition to the default lexer, Participle includes an optional stateful/modal lexer which provides powerful yet convenient construction of most lexers. (Notably, indentation based lexers cannot be expressed using the stateful lexer -- for discussion of how these lexers can be implemented, see #20).

It is sometimes the case that a simple lexer cannot fully express the tokens required by a parser. The canonical example of this is interpolated strings within a larger language. eg.

let a = "hello ${name + ", ${last + "!"}"}"

This is impossible to tokenise with a normal lexer due to the arbitrarily deep nesting of expressions. To support this case Participle's lexer is now stateful by default.

The lexer is a state machine defined by a map of rules keyed by the state name. Each rule within the state includes the name of the produced token, the regex to match, and an optional operation to apply when the rule matches.

As a convenience, any Rule starting with a lowercase letter will be elided from output, though it is recommended to use participle.Elide() instead, as it better integrates with the parser.

Lexing starts in the Root group. Each rule is matched in order, with the first successful match producing a lexeme. If the matching rule has an associated Action it will be executed.

A state change can be introduced with the Action Push(state). Pop() will return to the previous state.

To reuse rules from another state, use Include(state).

A special named rule Return() can also be used as the final rule in a state to always return to the previous state.

As a special case, regexes containing backrefs in the form \N (where N is a digit) will match the corresponding capture group from the immediate parent group. This can be used to parse, among other things, heredocs. See the tests for an example of this, among others.

Example stateful lexer

Here's a cut down example of the string interpolation described above. Refer to the stateful example for the corresponding parser.

var lexer = lexer.Must(Rules{
    "Root": {
        {`String`, `"`, Push("String")},
    },
    "String": {
        {"Escaped", `\\.`, nil},
        {"StringEnd", `"`, Pop()},
        {"Expr", `\${`, Push("Expr")},
        {"Char", `[^$"\\]+`, nil},
    },
    "Expr": {
        Include("Root"),
        {`whitespace`, `\s+`, nil},
        {`Oper`, `[-+/*%]`, nil},
        {"Ident", `\w+`, nil},
        {"ExprEnd", `}`, Pop()},
    },
})

Example simple/non-stateful lexer

Other than the default and stateful lexers, it's easy to define your own stateless lexer using the lexer.MustSimple() and lexer.NewSimple() functions. These functions accept a slice of lexer.SimpleRule{} objects consisting of a key and a regex-style pattern.

Note: The stateful lexer replaces the old regex lexer.

For example, the lexer for a form of BASIC:

var basicLexer = lexer.MustSimple([]lexer.SimpleRule{
    {"Comment", `(?i)rem[^\n]*`},
    {"String", `"(\\"|[^"])*"`},
    {"Number", `[-+]?(\d*\.)?\d+`},
    {"Ident", `[a-zA-Z_]\w*`},
    {"Punct", `[-[!@#$%^&*()+_={}\|:;"'<,>.?/]|]`},
    {"EOL", `[\n\r]+`},
    {"whitespace", `[ \t]+`},
})

Experimental - code generation

Par

Extension points exported contracts — how you extend this code

Parseable (Interface)

The Parseable interface can be implemented by any element in the grammar to provide custom parsing. [16 implementers]

api.go

Lexer (Interface)

A Lexer returns tokens from a source. [9 implementers]

lexer/api.go

Node (Interface)

A Node in the EBNF grammar. [6 implementers]

ebnf/ebnf.go

ExprPrecAll (Interface)

(no doc) [8 implementers]

_examples/expr3/main.go

Evaluatable (Interface)

(no doc) [10 implementers]

_examples/basic/eval.go

Expr (Interface)

(no doc) [6 implementers]

_examples/expr4/main.go

Error (Interface)

Error represents an error while parsing. The format of an Error is in the form "[ :][ : :] " [3 implementers]

error.go

Action (Interface)

A Action is applied when a rule matches. [3 implementers]

lexer/stateful.go

Core symbols most depended-on inside this repo

Shape

Struct 362

Function 294

Method 293

Interface 25

TypeAlias 25

FuncType 4

Languages

Go99%

TypeScript1%

Modules by API surface

parser_test.go183 symbols

nodes.go61 symbols

_examples/expr3/main.go47 symbols

lexer/stateful.go38 symbols

lookahead_test.go30 symbols

_examples/protobuf/main.go28 symbols

_examples/sql/main.go26 symbols

lexer/api.go25 symbols

_examples/microc/main.go25 symbols

ebnf/ebnf.go24 symbols

_examples/expr/main.go22 symbols

grammar.go21 symbols

Used by 12 indexed graphs manifest dependencies, hub-wide

github.com/cilium/cilium

github.com/grafana/tempo

github.com/influxdata/telegraf

github.com/istio/istio

github.com/jaegertracing/jaeger

github.com/juicedata/juicefs

github.com/lima-vm/lima

github.com/mikefarah/yq

github.com/ory/hydra

github.com/ory/kratos

… +2 more

Dependencies from manifests, versioned

github.com/alecthomas/assert/v2v2.11.0 · 1×

github.com/alecthomas/go-thriftv0.0.3 · 1×

github.com/alecthomas/kongv1.6.1 · 1×

github.com/alecthomas/participle/v2v2.1.1 · 1×

github.com/alecthomas/reprv0.4.0 · 1×

github.com/hexops/gotextdiffv1.0.3 · 1×

For agents

$ claude mcp add participle \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact