Runtime Model
Big Picture
OptParse is organized around two related but distinct abstractions:
ValueParser{T}converts a single raw string token into a value of typeTParser{T,S,p,P}consumes command-line structure and eventually produces a value of typeT
The high-level split in the source tree is:
src/core/- parsing context
- parse result types
- structured errors and error rendering
src/parsers/valueparsers/- string-to-value parsers such as
str,integer,choice,flt,uuid,path
- string-to-value parsers such as
src/parsers/primitives/- leaf parser families such as
gate,flag,option,arg,command
- leaf parser families such as
src/parsers/constructors/- combinators that combine child parsers such as
object,or,sequence,concat,combine
- combinators that combine child parsers such as
src/parsers/modifiers/- wrappers that transform parser behavior, such as
default,optional,multiple
- wrappers that transform parser behavior, such as
src/display/- pretty-printing of parser values
The central entrypoints live in src/OptParse.jl:
tryargparse(parser, argv)argparse(parser, argv)normalize_argv(argv)
Parser{T,S,p,P} Type Parameters
The wrapped public parser type is:
Parser{T,S,p,P}The parameters mean:
T- the final value type returned by
complete - equivalently, the success type of
ParseResult{T}
- the final value type returned by
S- the parser-family-specific state type threaded through
Context{S} - this is the type consumed by
complete(p, state::S)
- the parser-family-specific state type threaded through
p- the parser priority as a compile-time integer parameter
P- parser-family-specific extra type information
- for leaf families this is often
Nothing - for wrappers and constructors this is often the child parser type or tuple of child parser types
The invariants are:
Tis the semantic output type of the parserSis the only state shape that parser family should interact with directlypis stable for a given parser family instance and drives constructor schedulingPshould stay concrete so that parser-family-specific code can remain inferable
Helper functions expose the same information in code:
tval(parser_or_type)
tstate(parser_or_type)
priority(parser_or_type)
ptypes(parser_or_type)When adding a parser family, think of S and p as part of the parser contract, not as incidental implementation details.
Parse Model
OptParse parsing is split into two phases.
parse
The parse phase incrementally walks the parser tree while consuming command-line tokens and updating parser-local state.
Each parser family implements a method shaped like:
parse(p::SomeParser{T,S}, ctx::Context{S})::InnerParseResult{S}The result can be:
InnerParseSuccess{S}with:- a
Consumedview of the consumed tokens - a next
Context{S} counts_as_match::Bool
- a
InnerParseFailurewith:- an integer “consumed count” used for choosing better failures
- a
ParseError
complete
The complete phase collapses the final parser state into the returned value:
complete(p::SomeParser{T,S}, state::S)::ParseResult{T}Typical complete responsibilities:
- turn successful parser-local state into the final user-facing value
- enforce completion-time invariants, such as:
- “required flag was never matched”
- “multiple matched fewer than
mintimes” - “one child parser failed to complete inside a constructor”
- add parser-specific error context when resurfacing child failures
Why the split exists
This split is the key to the package design:
parsefocuses on token flow and structural matchingcompletefocuses on final validity and value extraction
That separation keeps combinators composable and makes parse, don't validate fit naturally into the implementation.
Parser State
Parser state is intentionally a parser-family implementation detail.
A parser family should only ever interact with its own state shape. For example:
ArgGateworks withGateStateArgOptionworks withOptionState{T}ArgCommandworks withCommandState{PState}ModMultipleworks withMultipleState{S}
This is why tight parse / complete signatures matter so much: they enforce the rule that a parser family only operates on its own state.
In practice, state shape should usually mirror the macro-state the parser can be in. Typical examples:
- a gate or required flag has states like:
- not matched yet
- matched successfully
- failed to complete because it never matched
- a command has states like:
- command token not matched yet
- command matched, inner parser not started yet
- command matched, inner parser started and has child state
- a repetition has states like:
- zero repetitions matched
- one or more repetitions matched, each with its own child state snapshot
This does not mean every parser needs an explicit enum for those macro-states. It means the chosen state representation should make those states obvious and easy to reason about.
Good examples already in the codebase are:
const GateState = ParseResult{Bool}
const OptionState{T} = ParseResult{T}
const CommandState{S} = Option{Option{S}}
const MultipleState{S} = Vector{S}Those are implementation details, but they encode the parser family’s conceptual state machine.
Wrapped Unions And Why State Signatures Matter
The public Parser wrapper is a @wrapped union over all parser families. Likewise, ValueParser is a wrapped union over all value parser families.
This design keeps the public surface simple while still allowing family-specific concrete implementations underneath.
One important consequence is that parser families must constrain their parse and complete signatures to their real state invariants.
For example:
function parse(p::ArgOption{T, OptionState{T}}, ctx::Context{OptionState{T}})::InnerParseResult{OptionState{T}} where {T}is better than a looser:
function parse(p::ArgOption, ctx::Context)because the tight signature:
- documents the actual invariant of the parser family
- avoids impossible
Parserunion branches surviving too long in inference - gives JET less nonsense to analyze
- makes trimming behavior much more predictable
Context
Context{S} lives in src/core/context.jl.
Conceptually, Context is the current parser execution frame:
- it says which normalized argv buffer is being parsed
- where the parser currently is in that buffer
- what the parser-family-local state currently is
- whether global option parsing has already been terminated by
--
It carries:
buffer::Vector{String}pos::Intstate::SoptionsTerminated::Bool
The important point is that Context is parameterized by the parser state type. That means “update the state” is not merely a field assignment. It is often an inference checkpoint.
A parser should only ever interact with Context through the helper functions and centralized checkpoints in src/core/context.jl.
In particular, parser-family code should avoid rebuilding contexts ad hoc unless there is a very good reason. The helper API keeps context updates:
- explicit
- type-stable
- grep-friendly
- consistent across parser families
The main helpers are:
ctx_with_state(ctx, s)ctx_restate(ctx, s)widen_state(::Type, ctx)widen_restate(::Type, ctx, s)ctx_with_options_terminated(ctx, flag)ctx_hasmore,ctx_hasnone,ctx_peek,ctx_remaining,ctx_lengthconsume(ctx, n)
Flat context access vs optics
Current guidance:
- use direct helpers or centralized checkpoints for
Context - use optics where nested immutable state updates are actually needed
This is why:
Contextupdates go through helpers likectx_with_stateandconsume- nested constructor state updates still use
PropertyLens/IndexLens
As a rule of thumb:
- for
Context, use the helper API - for nested constructor state stored inside
Context.state, optics are still appropriate
Consumed
Consumed lives in src/core/parseresult.jl.
It is a cheap view of consumed tokens:
- it stores the shared input buffer
- it stores one or more ranges into that buffer
- it behaves like an
AbstractVector{String}
This avoids eagerly materializing consumed token vectors while still making it easy to:
- inspect consumed tokens in tests
- merge consumptions from nested combinators
- preserve a precise token view when bundled short flags are expanded
Important helpers:
consumed_empty(ctx)merge(::Vector{Consumed})as_vector(consumed)as_tuple(consumed)
InnerParseSuccess and InnerParseFailure
These also live in src/core/parseresult.jl.
InnerParseSuccess carries:
consumed::Consumednext::Context{S}counts_as_match::Bool
counts_as_match is subtle and important.
It exists because not every successful token consumption should count as a semantic parser match. The main current example is --:
- a primitive parser may consume
-- - that should update
optionsTerminated - but it should not satisfy an
orbranch or one slot of asequence
So:
- “consumed input” and “counts as a semantic match” are intentionally different concepts
Helpers:
innerOk(ctx, n; nextctx=..., counts_as_match=true)innerOk(nextctx, consumed, counts_as_match=true)innerErr(ctx, perr; consumed=0)
Error Model
Structured errors live in src/core/errors.jl.
The core pieces are:
ErrorPhaseParsePhaseValuePhaseCompletePhase
ErrorDomain- one domain per parser family and value parser family
ErrorSite- contextual breadcrumb used for rendered error subjects
ParseError- structured error payload
ParseException- thrown by
argparsein non-generated runtime mode
- thrown by
Every parser family or value parser family should define:
- its own error-code enum
- a constructor like
argoption_error(...)orintegerval_error(...) - a renderer like
argoption_render_error(...)
When resurfacing a child error, add context via:
error_with_context(result, CompletePhase, ERR_SomeDomain, "subject")This is how the final rendered message accumulates parser-specific context.