Macro define
define!() { /* proc-macro */ }
Expand description
Allows defining multiple custom rules using a PEG-like syntax.
§Syntax and explanation
First, we have the name, visibility, and “slice type” of the grammar:
pub grammar Name<Type>
The visibility and name are self-explanatory, and the “slice type” determines what type this grammar parses.
If you want to parse a string, you can use str
as your slice type,
or if you want to parse bytes, you create a grammar with type [u8]
.
Note that the type inside a [T]
can be just about anything, so this works fine:
enum Token { LParen, RParen, Number(f64), Plus, Minus, Asterisk, Slash }
define! {
pub grammar Expression<[Token]> { /* ... */ }
}
However, due to limitations caused by trait implementation collisions1,
you can’t use a T
as a rule for a [T]
grammar -
you have to use a 1-long array.
Moving on, we come to defining individual rules, which is better described with a few examples:
/// Parses either `Tap`, `Tag`, or `Box`.
Sample -> (&'input str, Option<char>) = "Ta", ('p', 'g') : "Box";
We see here the basic syntax of defining a rule:
- Any attributes/documentation
- The rule’s name
- The rule’s return type
=
- A
:
separated list of potential ways to parse the rule (we call these “parsing paths”), where each option is a list of,
separatedRule
s.
A rule’s return type is a tuple with arity determined by its options.
If two different paths have different arity, then
any return types past the minimum amount between all options
will be wrapped in an Option
, returning None
for rules which don’t have them.
For example, this:
A -> (char, Option<char>) = 'b', 'c' : 'd';
will return ('b', Some('c'))
for "bc"
, but ("d", None)
for "d"
.
Paths are parsed sequentially, so in the above example,
As a special case, a rule with arity 1 will return its inner type instead of a 1-tuple,
and a rule with arity 0 will return ()
.
Let’s look at a more complicated example:
/// Parses an unsigned number, potentially surrounded by parentheses.
pub ParenNumber -> u32 = _ '(', ParenNumber, _ ')' : Number;
/// Parses an unsigned number.
Number -> u32 try_from(u32::from_str) = While::from(char::is_ascii_digit);
Here, we see a few things.
Firstly, rules support arbitrary documentation and attributes,
all of which will be applied to the generated struct
.
Second, there’s support for a visibility in front of a rule -
a rule must be pub
(or pub(super)
, or pub(crate)
, etc.)
to be accessible outside of the grammar’s definition.
After that, we see some new syntax in the form of _
.
Prefixing any part of a path with an underscore makes the grammar
not store its output, and not save it as an argument.
Finally, we see try_from
.
This, along with its infallible counterpart from
,
takes the output of your entire rule and passes it into a given function.
This function will need to take the amount and type of arguments
equal to the arity and elements of the tuple it would usually output -
in layman’s terms, the tuple is “unpacked” into the function, similar to ...
in JavaScript or *
in Python.
A function given to from
must return the type of the rule,
and a function given to try_from
must return a Result<T, E>
,
where T is the type of the rule, and E
is any type that implements Error
.
We also see that rules can recurse - although this should be avoided if reasonably possible.
If a rule recurses too deep, then Rust’s call stack will be exhausted and your program will crash!
If you truly need deep recursion, look into crates like stacker
.
Finally, let’s look at an advanced example:
pub Doubled<'input, R> {rule: R, _p: PhantomData<&'input str>} -> &'input str
where R: Rule<'input, str, Output = &'input str>
= &self.rule, _ arg_0;
We see here that rules are simply structs, and can take generics and fields as such.
Also to note is the usage of arg_0
.
Previously parsed arguments (ignoring silenced ones) are left in scope as arg_N
,
where N is the index of the argument.
Importantly, 'input
is special - it’s always in scope, even if not declared,
and is the lifetime of the data that’s being passed in.
Anything that implements for<'input> Rule<'input, T>
will work in any grammar of slice type T
-
they don’t have to even be from the same macro!
For example, this is fine:
define! {
grammar Theta<str> {
pub Foo -> &'input str = "foo";
}
}
define! {
grammar Delta<str> {
pub Bar -> &'input str = Theta::Foo;
}
}
Alternatively, if, God forbid, something happens so that you need to implement a Rule
manually,
without the help of this macro, look at the documentation for that trait - it’ll tell you everything you need to know.
§Example
Below is the entire source code for examples/math.rs
, showing a simple CLI calculator app.
use std::{str::FromStr, io::Write};
use fn_bnf::{define, Any, Rule, While, Fail, errors::Unexpected};
#[derive(Debug, PartialEq, Copy, Clone)]
pub enum Token {
Number(f64),
Plus, Minus, Asterisk, Slash, Carat, Percent, Ans,
LeftParen, RightParen
}
define! {
grammar MathTokens<str> {
// Mapping the individual parses to () makes .hoard() create a Vec<()>, which doesn't allocate
WhitespaceToken -> () = _ (' ', '\n', '\t');
Whitespace -> () = _ WhitespaceToken, _ WhitespaceToken.hoard();
pub LangTokens -> Vec<Token> = LangToken.consume_all()
.map_parsed(|v| v.into_iter().filter_map(|v| v).collect() );
LangToken -> Option<Token> =
Num : Plus : Minus : Asterisk : Slash : Percent : Carat
: LParen : RParen : Ans : _ Whitespace
: InvalidChar;
// Since Fail returns !, we can coerce from that to a token
InvalidChar -> Token from(|_, n| n) = Any, Fail::new(Unexpected::new(arg_0));
Plus -> Token = '+'.map_parsed(|_| Token::Plus);
Minus -> Token = '-'.map_parsed(|_| Token::Minus);
Asterisk -> Token = '*'.map_parsed(|_| Token::Asterisk);
Slash -> Token = '/'.map_parsed(|_| Token::Slash);
Percent -> Token = '%'.map_parsed(|_| Token::Percent);
Carat -> Token = '^'.map_parsed(|_| Token::Carat);
LParen -> Token = '('.map_parsed(|_| Token::LeftParen);
RParen -> Token = ')'.map_parsed(|_| Token::RightParen);
Ans -> Token = "ans".map_parsed(|_| Token::Ans);
Num -> Token from(|n| Token::Number(n)) =
("nan", "NaN").map_parsed(|_| f64::NAN) :
("inf", "Infinity").map_parsed(|_| f64::INFINITY) :
Float;
Float -> f64 try_from(f64::from_str) = FloatTokens.spanned().map_parsed(|span| span.source);
FloatTokens -> () = _ UInt, _ FloatFract.attempt(), _ FloatExp.attempt();
FloatFract -> () = _ '.', _ UInt;
FloatExp -> () = _ ('e', 'E'), _ ('-', '+').attempt(), _ UInt;
UInt -> &'input str = While::from(char::is_ascii_digit);
}
}
define! {
grammar TokenMath<[Token]> {
pub Expr -> f64 from(parse_expr) = Prod, SumSuf.consume_all();
EOF -> () = Rule::<'input, [Token]>::prevent(Any);
Sum -> f64 from(parse_expr) = Prod, SumSuf.hoard();
SumSuf -> (&'input [Token], f64) = ([Token::Plus], [Token::Minus]), Prod;
Prod -> f64 from(parse_expr) = Exp, ProdSuf.hoard();
ProdSuf -> (&'input [Token], f64) = ([Token::Asterisk], [Token::Slash], [Token::Percent]), Exp;
Exp -> f64 from(parse_expr) = Neg, ExpSuf.hoard();
ExpSuf -> (&'input [Token], f64) = [Token::Carat], Neg;
Neg -> f64 from(|negative, num: f64| if negative {-num} else {num})
= [Token::Minus].attempt().map_parsed(|opt| opt.is_ok()), Atom;
Atom -> f64 = _ [Token::LeftParen], Sum, _ [Token::RightParen] : Number;
Number -> f64 try_from(|token: &Token| {
let Token::Number(n) = token else { return Err(Unexpected::<Token>::new(*token)); };
Ok(*n)
}) = Any;
}
}
fn parse_expr(mut lhs: f64, suffixes: Vec<(&[Token], f64)>) -> f64 {
for (op, rhs) in suffixes {
match op[0] {
Token::Plus => lhs += rhs,
Token::Minus => lhs -= rhs,
Token::Asterisk => lhs *= rhs,
Token::Slash => lhs /= rhs,
Token::Percent => lhs %= rhs,
Token::Carat => lhs = lhs.powf(rhs),
_ => unreachable!()
}
}
lhs
}
fn main() -> Result<(), std::io::Error> {
let mut lines = std::io::stdin().lines();
println!("Input a math expression below, `clear` to clear the console, or `exit` / `quit` to exit.");
println!("You can access the result of the last expression with `ans`.");
let mut last_ans = None;
'outer: loop {
print!("[?] ");
std::io::stdout().flush()?;
let Some(input) = lines.next().transpose()? else { break Ok(()) };
let input = input.trim_ascii();
if input.is_empty() { print!("\x1b[1A"); continue; }
match input {
"clear" => print!("\x1bc"),
"exit" | "quit" => break Ok(()),
_ => {
let (_, mut tokens) = match MathTokens::LangTokens.parse(&input) {
Ok(v) => v,
Err(err) => {
println!("[!] Failed to parse: {err}");
continue;
}
};
for ans in tokens.iter_mut().filter(|t| matches!(t, Token::Ans)) {
let Some(answer) = last_ans else {
println!("[!] No previous answer exists");
continue 'outer;
};
*ans = Token::Number(answer);
}
let (_, result) = match TokenMath::Expr.parse(tokens.as_ref()) {
Ok(v) => v,
Err(err) => {
println!("[!] Failed to parse: {err}");
continue;
}
};
last_ans = Some(result);
println!("[=] {result:.}")
}
}
}
}
It’s ambiguous between
[T]
and[&T]
, and between[T]
and[(T, )]
. Ideally, the library couldimpl !Rule
for these, but even disregarding the fact that that’s nightly, it doesn’t work - at least, for now. Said bounds would be equivalent to specialization, which is a long way off. ↩