MoonBit Pearls Vol.01：使用MoonBit编写Pratt解析器

2025年6月11日 · 阅读需 9 分钟

myfreess

上周 MoonBit 社区发起 MoonBit Pearls 高质量文档与示例征集活动，经过精细筛选，本周正式推出"MoonBit Pearls"专栏的首篇入选文章。专栏作为长期知识沉淀平台，持续收录优质文档。我们期待更多开发者参与后续投稿，共同丰富 MoonBit 社区生态。

以下是首篇投稿正文内容，作者通过完整案例，演示了如何用 MoonBit 编写 Pratt 解析器：

在编译过程中，语法分析（也称为解析，Parsing）是一个关键步骤。解析器的主要职责是将Token流转换成抽象语法树（AST）。

本文将介绍一种解析器的实现算法：Pratt解析(Pratt Parsing)，是自顶向下的算符优先分析法(Top Down Operator Precedence Parsing)，并展示如何用MoonBit来实现它。

为什么用Pratt解析器

几乎每个程序员都不会对中缀表达式感到陌生, 即使是坚定的Lisp/Forth程序员，至少也知道世界上有大半人这样写算术表达式：

24 * (x + 4)

而对于编译器(或者解释器)的编写者而言，这样的中缀表达式要比Lisp所用的前缀表达式和Forth使用的后缀表达式难解析一点。例如，使用朴素的手写递归下降解析器来解析就需要多个互相递归的函数，还得在分析表达式语法时消除左递归，这样的代码在运算符增多时变得很不友好。解析器生成工具在这一问题上也不是很令人满意的选项，以一个简单加法和乘法运算表达式的BNF为例：

Expr ::=
    Factor
    | Expr '+' Factor
Factor ::=
    Atom
    | Factor '*' Atom
Atom ::=
    'number'
    | '(' Expr ')'

这看起来并不是很直观，搞不好还得花时间复习一下大学里上过的形式语言课程。

而有些语言如Haskell支持自定义的中缀运算符，这几乎不太可能简单地用某种解析器生成工具解决。

Pratt解析器很好地解决了中缀表达式解析的问题，与此同时，它还很方便扩展支持添加新的运算符(不需要改源码就可以做到！)。它被著名的编译原理书籍《Crafting Interpreters》推荐和递归下降解析器一同使用，rust-analyzer项目中也使用了它。

结合力

Pratt 解析器中用于描述结合性和优先级的概念叫做binding power(结合力)，对于每个中缀运算符而言，其结合力是一对整数 - 左右各一个。如下所示：

expr:   A     +     B     *     C
power:     3     3     5     5

而其作用和名字非常符合，数字越大，越能优先获取某个操作数(operand, 这个例子中A B C都是操作数)。

上面的例子展示了具有不同优先级的运算符，而同一运算符的结合性通过一大一小的结合力来表示。

expr:   A     +     B     +     C
power:     1     2     1     2

在这个例子中，当解析到B时，由于左边的结合力较大，表达式会变成这样：

expr:   (A + B)     +     C
power:           1     2

接下来让我们看看Pratt解析器在具体执行时如何使用这一概念。

概览与前期准备

Pratt解析器的主体框架大概是这样：

fn parse(self : Tokens, min_bp : Int) -> SExpr ! ParseError {
    ...
    while true {
       parse(...)
    }
    ...
}

从上文可以看出，它是交替使用递归和循环实现的。这其实对应着两种模式：

永远是最左边的表达式在最内层，即"1 + 2 + 3" = "(1 + 2) + 3", 只需要使用循环就能解析
永远最右边的表达式在最内层，即"1 + 2 + 3" = "1 + (2 + 3)", 这只使用递归也可以解析

min_bp是一个代表左侧某个还没有解析完毕的运算符结合力的参数。

我们的目标是读入一个token流，并输出一个不需要考虑优先级的前缀表达式：

enum SExpr {
  (String) -> SExpr
Atom(String
String)
  (Char, Array[SExpr]) -> SExpr
Cons(Char
Char, type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum SExpr {
  Atom(String)
  Cons(Char, Array[SExpr])
}
SExpr])
}

impl trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show for enum SExpr {
  Atom(String)
  Cons(Char, Array[SExpr])
}
SExpr with fn Show::output(self : SExpr, logger : &Logger) -> Unit
output(SExpr
self, &Logger
logger) {
    match SExpr
self {
        (String) -> SExpr
Atom(String
s) => &Logger
logger.fn Logger::write_string(&Logger, String) -> Unit
write_string(String
s)
        (Char, Array[SExpr]) -> SExpr
Cons(Char
op, Array[SExpr]
args) => {
            &Logger
logger.fn Logger::write_char(&Logger, Char) -> Unit
write_char('(')
            &Logger
logger.fn Logger::write_char(&Logger, Char) -> Unit
write_char(Char
op)
            for Int
i = 0; Int
i fn Compare::op_lt(x : Int, y : Int) -> Bool
< Array[SExpr]
args.fn[T] Array::length(self : Array[T]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
}
length(); Int
i = Int
i fn Add::add(self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
test {
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
}
+ 1 {
                &Logger
logger.fn Logger::write_char(&Logger, Char) -> Unit
write_char(' ')
                &Logger
logger.fn Logger::write_string(&Logger, String) -> Unit
write_string(Array[SExpr]
argsfn[T] Array::op_get(self : Array[T], index : Int) -> T
Retrieves an element from the array at the specified index.
Parameters:

array : The array to get the element from.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
}
[i].fn[Self : Show] Show::to_string(self : Self) -> String
Default implementation for Show::to_string, uses a StringBuilder
to_string())
            }
            &Logger
logger.fn Logger::write_char(&Logger, Char) -> Unit
write_char(')')
        }
    }
}

test {
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
test {
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
}
inspect((Char, Array[SExpr]) -> SExpr
Cons('+', [(String) -> SExpr
Atom("3"), (String) -> SExpr
Atom("4")]), String
content="(+ 3 4)")
}

由于这个过程中可能有各种各样的错误，所以parseExpr的返回类型是Sexpr ! ParseError。

不过在开始编写解析器之前，我们还需要对字符串进行分割，得到一个简单的Token流。

enum Token {
  Token
LParen
  Token
RParen
  (String) -> Token
Operand(String
String)
  (Char) -> Token
Operator(Char
Char)
  Token
Eof
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
  not_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq)

struct Tokens {
  mut Int
position : Int
Int
  Array[Token]
tokens : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Token {
  LParen
  RParen
  Operand(String)
  Operator(Char)
  Eof
} derive(Show, Eq)
Token]
}

这个token流需要实现两个方法：peek() pop()

peek()方法能获取token流中的第一个token，对状态无改变，换言之它是无副作用的，只是偷看一眼将要处理的内容。对于空token流，它返回Eof。

fn fn Tokens::peek(self : Tokens) -> Token
peek(Tokens
self : struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens) -> enum Token {
  LParen
  RParen
  Operand(String)
  Operator(Char)
  Eof
} derive(Show, Eq)
Token {
  if Tokens
self.Int
position fn Compare::op_lt(x : Int, y : Int) -> Bool
< Tokens
self.Array[Token]
tokens.fn[T] Array::length(self : Array[T]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
}
length() {
    Tokens
self.Array[Token]
tokens.fn[T] Array::unsafe_get(self : Array[T], idx : Int) -> T
Retrieves the element at the specified index from an array without bounds
checking.
Parameters:

array : The array from which to retrieve the element.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.unsafe_get(1), content="2")
}
unsafe_get(Tokens
self.Int
position)
  } else {
    Token
Eof
  }
}

pop()在peek()的基础上消耗一个token。

fn fn Tokens::pop(self : Tokens) -> Token
pop(Tokens
self : struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens) -> enum Token {
  LParen
  RParen
  Operand(String)
  Operator(Char)
  Eof
} derive(Show, Eq)
Token {
  if Tokens
self.Int
position fn Compare::op_lt(x : Int, y : Int) -> Bool
< Tokens
self.Array[Token]
tokens.fn[T] Array::length(self : Array[T]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
}
length() {
    let Int
pos = Tokens
self.Int
position
    Tokens
self.Int
position fn Add::add(self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
test {
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
}
+= 1
    Tokens
self.Array[Token]
tokens.fn[T] Array::unsafe_get(self : Array[T], idx : Int) -> T
Retrieves the element at the specified index from an array without bounds
checking.
Parameters:

array : The array from which to retrieve the element.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.unsafe_get(1), content="2")
}
unsafe_get(Int
pos)
  } else {
    Token
Eof
  }
}

tokenize函数负责将一个字符串解析成token流。

fn fn isDigit(this : Char) -> Bool
isDigit(Char
this : Char
Char) -> Bool
Bool {
    Char
this is '0'..='9'
}

fn fn isAlpha(this : Char) -> Bool
isAlpha(Char
this : Char
Char) -> Bool
Bool {
    Char
this is 'A'..='Z' (Bool, Bool) -> Bool
|| Char
this is 'a'..='z'
}

fn fn isWhiteSpace(this : Char) -> Bool
isWhiteSpace(Char
this : Char
Char) -> Bool
Bool {
    Char
this fn Eq::equal(self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:

self : The first character to compare.
other : The second character to compare.
Returns true if both characters represent the same Unicode code point,
false otherwise.
Example:
test {
  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
}
== ' ' (Bool, Bool) -> Bool
|| Char
this fn Eq::equal(self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:

self : The first character to compare.
other : The second character to compare.
Returns true if both characters represent the same Unicode code point,
false otherwise.
Example:
test {
  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
}
== '\t' (Bool, Bool) -> Bool
|| Char
this fn Eq::equal(self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:

self : The first character to compare.
other : The second character to compare.
Returns true if both characters represent the same Unicode code point,
false otherwise.
Example:
test {
  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
}
== '\n'
}

fn fn isOperator(this : Char) -> Bool
isOperator(Char
this : Char
Char) -> Bool
Bool {
    let String
operators = "+-*/"
    String
operators.fn String::contains_char(self : String, c : Char) -> Bool
Returns true if this string contains the given character.
contains_char(Char
this)
}

type! LexError Int
Int

fn fn tokenize(source : String) -> Tokens raise LexError
tokenize(String
source : String
String) -> struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens!suberror LexError Int
LexError {
    let Array[Token]
tokens = []
    let Array[Char]
source = String
source.fn String::to_array(self : String) -> Array[Char]
Converts the String into an array of Chars.
to_array()
    let StringBuilder
buf = type StringBuilder
StringBuilder::fn StringBuilder::new(size_hint~ : Int) -> StringBuilder
Creates a new string builder with an optional initial capacity hint.
Parameters:

size_hint : An optional initial capacity hint for the internal buffer. If
less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes,
not the size of characters. size_hint may be ignored on some platforms, JS for example.
Returns a new StringBuilder instance with the specified initial capacity.
new(Int
size_hint = 100)
    let mut Int
i = 0
    while Int
i fn Compare::op_lt(x : Int, y : Int) -> Bool
< Array[Char]
source.fn[T] Array::length(self : Array[T]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
}
length() {
        let Char
ch = Array[Char]
source.fn[T] Array::unsafe_get(self : Array[T], idx : Int) -> T
Retrieves the element at the specified index from an array without bounds
checking.
Parameters:

array : The array from which to retrieve the element.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.unsafe_get(1), content="2")
}
unsafe_get(Int
i)
        Int
i fn Add::add(self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
test {
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
}
+= 1
        if Char
ch fn Eq::equal(self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:

self : The first character to compare.
other : The second character to compare.
Returns true if both characters represent the same Unicode code point,
false otherwise.
Example:
test {
  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
}
== '('{
            Array[Token]
tokens.fn[T] Array::push(self : Array[T], value : T) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
test {
  let v = []
  v.push(3)
}
push(Token
LParen)
        } else if Char
ch fn Eq::equal(self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:

self : The first character to compare.
other : The second character to compare.
Returns true if both characters represent the same Unicode code point,
false otherwise.
Example:
test {
  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
}
== ')' {
            Array[Token]
tokens.fn[T] Array::push(self : Array[T], value : T) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
test {
  let v = []
  v.push(3)
}
push(Token
RParen)
        } else if fn isOperator(this : Char) -> Bool
isOperator(Char
ch) {
            Array[Token]
tokens.fn[T] Array::push(self : Array[T], value : T) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
test {
  let v = []
  v.push(3)
}
push((Char) -> Token
Operator(Char
ch))
        } else if fn isAlpha(this : Char) -> Bool
isAlpha(Char
ch) {
            StringBuilder
buf.fn Logger::write_char(self : StringBuilder, ch : Char) -> Unit
Writes a character to the StringBuilder.
write_char(Char
ch)
            while Int
i fn Compare::op_lt(x : Int, y : Int) -> Bool
< Array[Char]
source.fn[T] Array::length(self : Array[T]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
}
length() (Bool, Bool) -> Bool
&& (fn isAlpha(this : Char) -> Bool
isAlpha(Array[Char]
sourcefn[T] Array::op_get(self : Array[T], index : Int) -> T
Retrieves an element from the array at the specified index.
Parameters:

array : The array to get the element from.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
}
[i]) (Bool, Bool) -> Bool
|| fn isDigit(this : Char) -> Bool
isDigit(Array[Char]
sourcefn[T] Array::op_get(self : Array[T], index : Int) -> T
Retrieves an element from the array at the specified index.
Parameters:

array : The array to get the element from.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
}
[i]) (Bool, Bool) -> Bool
|| Array[Char]
sourcefn[T] Array::op_get(self : Array[T], index : Int) -> T
Retrieves an element from the array at the specified index.
Parameters:

array : The array to get the element from.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
}
[i] fn Eq::equal(self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:

self : The first character to compare.
other : The second character to compare.
Returns true if both characters represent the same Unicode code point,
false otherwise.
Example:
test {
  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
}
== '_') {
                StringBuilder
buf.fn Logger::write_char(self : StringBuilder, ch : Char) -> Unit
Writes a character to the StringBuilder.
write_char(Array[Char]
sourcefn[T] Array::op_get(self : Array[T], index : Int) -> T
Retrieves an element from the array at the specified index.
Parameters:

array : The array to get the element from.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
}
[i])
                Int
i fn Add::add(self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
test {
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
}
+= 1
            }
            Array[Token]
tokens.fn[T] Array::push(self : Array[T], value : T) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
test {
  let v = []
  v.push(3)
}
push((String) -> Token
Operand(StringBuilder
buf.fn StringBuilder::to_string(self : StringBuilder) -> String
Returns the current content of the StringBuilder as a string.
to_string()))
            StringBuilder
buf.fn StringBuilder::reset(self : StringBuilder) -> Unit
Resets the string builder to an empty state.
reset()
        } else if fn isDigit(this : Char) -> Bool
isDigit(Char
ch) {
            StringBuilder
buf.fn Logger::write_char(self : StringBuilder, ch : Char) -> Unit
Writes a character to the StringBuilder.
write_char(Char
ch)
            while Int
i fn Compare::op_lt(x : Int, y : Int) -> Bool
< Array[Char]
source.fn[T] Array::length(self : Array[T]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
}
length() (Bool, Bool) -> Bool
&& fn isDigit(this : Char) -> Bool
isDigit(Array[Char]
sourcefn[T] Array::op_get(self : Array[T], index : Int) -> T
Retrieves an element from the array at the specified index.
Parameters:

array : The array to get the element from.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
}
[i]) {
                StringBuilder
buf.fn Logger::write_char(self : StringBuilder, ch : Char) -> Unit
Writes a character to the StringBuilder.
write_char(Array[Char]
sourcefn[T] Array::op_get(self : Array[T], index : Int) -> T
Retrieves an element from the array at the specified index.
Parameters:

array : The array to get the element from.
index : The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
test {
  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
}
[i])
                Int
i fn Add::add(self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
test {
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
}
+= 1
            }
            Array[Token]
tokens.fn[T] Array::push(self : Array[T], value : T) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
test {
  let v = []
  v.push(3)
}
push((String) -> Token
Operand(StringBuilder
buf.fn StringBuilder::to_string(self : StringBuilder) -> String
Returns the current content of the StringBuilder as a string.
to_string()))
            StringBuilder
buf.fn StringBuilder::reset(self : StringBuilder) -> Unit
Resets the string builder to an empty state.
reset()
        } else if fn isWhiteSpace(this : Char) -> Bool
isWhiteSpace(Char
ch) {
            continue
        } else {
            raise (Int) -> LexError
LexError(Int
i)
        }
    } else {
        return struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens::{ Int
position : 0, Array[Token]
tokens }
    }
}

test {
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
test {
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
}
inspect(fn tokenize(source : String) -> Tokens raise LexError
tokenize("(((((47)))))").Array[Token]
tokens, String
content=
      #|[LParen, LParen, LParen, LParen, LParen, Operand("47"), RParen, RParen, RParen, RParen, RParen]
    )
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
test {
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
}
inspect(fn tokenize(source : String) -> Tokens raise LexError
tokenize("13 + 6 + 5 * 3").Array[Token]
tokens, String
content=
      #|[Operand("13"), Operator('+'), Operand("6"), Operator('+'), Operand("5"), Operator('*'), Operand("3")]
    )
}

最后我们还需要一个计算运算符结合力的函数，这可以用简单的match实现。在实际操作中为了便于添加新运算符，应该使用某种键值对容器。

fn fn infix_binding_power(op : Char) -> (Int, Int)?
infix_binding_power(Char
op : Char
Char) -> (Int
Int, Int
Int)? {
  match Char
op {
    '+' => ((Int, Int)) -> (Int, Int)?
Some((1, 2))
    '-' => ((Int, Int)) -> (Int, Int)?
Some((1, 2))
    '/' => ((Int, Int)) -> (Int, Int)?
Some((3, 4))
    '*' => ((Int, Int)) -> (Int, Int)?
Some((3, 4))
    _ => (Int, Int)?
None
  }
}

解析器实现

首先取出第一个token并赋值给变量lhs(left hand side的缩写，表示左侧参数)。

如果它是操作数，就存储下来
如果是左括号，则递归解析出第一个表达式，然后消耗掉一个成对的括号。
其他结果都说明解析出了问题，抛出错误

接着我们试着看一眼第一个运算符：

假如此时结果是Eof，那并不能算失败，一个操作数也可以当成是完整的表达式，直接跳出循环
结果是运算符, 正常返回
结果是右括号，跳出循环
其他结果则返回ParseError

接下来我们需要决定lhs归属于哪个操作符了，这里就要用到min_bp这个参数，它代表左边最近的一个尚未完成解析的操作符的结合力，其初始值为0(没有任何操作符在左边争抢第一个操作数)。不过，此处我们要先做个判断，就是运算符是不是括号 - 假如是括号，说明当前是在解析一个括号里的表达式，也应该跳出循环直接结束。这也是使用peek方法的原因之一，因为我们无法确定到底要不要在这里就消耗掉这个运算符。

在计算好当前运算符op的结合力之后，首先将左侧结合力l_bp和min_bp进行比较：

l_bp小于min_bp，马上break，这样就会将lhs返回给上层还等着右侧参数的运算符
否则用pop方法消耗掉当前操作符，并且递归调用parseExpr获取右侧参数，只是第二个参数使用当前操作符的右结合力r_bp。解析成功之后将结果赋值给lhs，继续循环

type! ParseError (Int
Int, enum Token {
  LParen
  RParen
  Operand(String)
  Operator(Char)
  Eof
} derive(Show, Eq)
Token) derive (trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show)

fn fn Tokens::parseExpr(self : Tokens, min_bp? : Int) -> SExpr raise ParseError
parseExpr(Tokens
self : struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens, Int
min_bp~ : Int
Int = 0) -> enum SExpr {
  Atom(String)
  Cons(Char, Array[SExpr])
}
SExpr ! suberror ParseError (Int, Token) derive(Show)
ParseError {
    let mut SExpr
lhs = match Tokens
self.fn Tokens::pop(self : Tokens) -> Token
pop() {
        Token
LParen => {
            let SExpr
expr = Tokens
self.fn Tokens::parseExpr(self : Tokens, min_bp? : Int) -> SExpr raise ParseError
parseExpr()
            if Tokens
self.fn Tokens::peek(self : Tokens) -> Token
peek() is Token
RParen {
                fn[T] ignore(t : T) -> Unit
Evaluates an expression and discards its result. This is useful when you want
to execute an expression for its side effects but don't care about its return
value, or when you want to explicitly indicate that a value is intentionally
unused.
Parameters:

value : The value to be ignored. Can be of any type.
Example:
test {
  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each(x => sum = sum + x)) // Ignore the Unit return value of each()
}
ignore(Tokens
self.fn Tokens::pop(self : Tokens) -> Token
pop())
                SExpr
expr
            } else {
                raise ((Int, Token)) -> ParseError
ParseError((Tokens
self.Int
position, Tokens
self.fn Tokens::peek(self : Tokens) -> Token
peek()))
            }
        }
        (String) -> Token
Operand(String
s) => (String) -> SExpr
Atom(String
s)
        Token
t => raise ((Int, Token)) -> ParseError
ParseError((Tokens
self.Int
position fn Sub::sub(self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:

self : The minuend (the number being subtracted from).
other : The subtrahend (the number to subtract).
Returns the difference between self and other.
Example:
test {
  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
}
- 1, Token
t))
    }
    while true {
        let Char
op = match Tokens
self.fn Tokens::peek(self : Tokens) -> Token
peek() {
            Token
Eof | Token
RParen => break
            (Char) -> Token
Operator(Char
op) => Char
op
            Token
t => raise ((Int, Token)) -> ParseError
ParseError((Tokens
self.Int
position, Token
t))
        }
        guard fn infix_binding_power(op : Char) -> (Int, Int)?
infix_binding_power(Char
op) is ((Int, Int)) -> (Int, Int)?
Some((Int
l_bp, Int
r_bp)) else {
            raise ((Int, Token)) -> ParseError
ParseError((Tokens
self.Int
position, (Char) -> Token
Operator(Char
op)))
        }
        if Int
l_bp fn Compare::op_lt(x : Int, y : Int) -> Bool
< Int
min_bp {
            break
        }
        fn[T] ignore(t : T) -> Unit
Evaluates an expression and discards its result. This is useful when you want
to execute an expression for its side effects but don't care about its return
value, or when you want to explicitly indicate that a value is intentionally
unused.
Parameters:

value : The value to be ignored. Can be of any type.
Example:
test {
  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each(x => sum = sum + x)) // Ignore the Unit return value of each()
}
ignore(Tokens
self.fn Tokens::pop(self : Tokens) -> Token
pop())
        let SExpr
rhs = Tokens
self.fn Tokens::parseExpr(self : Tokens, min_bp~ : Int) -> SExpr raise ParseError
parseExpr(Int
min_bp = Int
r_bp)
        SExpr
lhs = (Char, Array[SExpr]) -> SExpr
Cons(Char
op, [SExpr
lhs, SExpr
rhs])
        continue
    }
    return SExpr
lhs
}

fn fn parse(s : String) -> SExpr raise
parse(String
s : String
String) -> enum SExpr {
  Atom(String)
  Cons(Char, Array[SExpr])
}
SExpr ! type Error
Error {
    fn tokenize(source : String) -> Tokens raise LexError
tokenize(String
s).fn Tokens::parseExpr(self : Tokens, min_bp? : Int) -> SExpr raise ParseError
parseExpr()
}

现在我们获得了一个可扩展的四则运算表达式解析器，可以在下面测试块中添加更多的例子来验证其正确性。

test {
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
test {
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
}
inspect(fn parse(s : String) -> SExpr raise
parse("13 + 6 + 5 * 3"), String
content="(+ (+ 13 6) (* 5 3))")
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
test {
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
}
inspect(fn parse(s : String) -> SExpr raise
parse("3 * 3 + 5 * 5"), String
content="(+ (* 3 3) (* 5 5))")
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
test {
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
}
inspect(fn parse(s : String) -> SExpr raise
parse("(3 + 4) * 3 * (17 * 5)"), String
content="(* (* (+ 3 4) 3) (* 17 5))")
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
test {
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
}
inspect(fn parse(s : String) -> SExpr raise
parse("(((47)))"), String
content="47")
}

不过，pratt parser的能力不止于此，它还可以解析前缀运算符(例如按位取反!n)、数组索引运算符arr[i]乃至于三目运算符c ? e1 : e2。关于这方面更详细的解析请见Simple but Powerful Pratt Parsing, 这篇博客的作者在著名的程序分析工具rust-analyzer中实现了一个工业级的pratt parser。

为什么用Pratt解析器​

结合力​

概览与前期准备​

Example

Example

Example

Example

Example

解析器实现​

为什么用Pratt解析器

结合力

概览与前期准备

解析器实现