跳到主要内容

Moonbit 与 llvm 共舞 上篇 - 实现语法前端

· 阅读需 17 分钟


引言

编程语言设计与编译器实现历来被视为计算机科学领域中最具挑战性的课题之一。传统的编译器教学路径往往要求学生首先掌握复杂的理论基础:

  • 自动机理论:有限状态自动机与正则表达式
  • 类型理论:λ演算与类型系统的数学基础
  • 计算机体系结构:从汇编语言到机器码的底层实现

然而,Moonbit作为一门专为现代开发环境设计的函数式编程语言,为我们提供了一个全新的视角。它不仅具备严谨的类型系统和卓越的内存安全保障,更重要的是,其丰富的语法特性和为AI时代量身定制的工具链,使得Moonbit成为学习和实现编译器的理想选择。

系列概述 本系列文章将通过构建一个名为TinyMoonbit的小型编程语言编译器,深入探讨现代编译器实现的核心概念和最佳实践。

  • 上篇:聚焦语言前端的实现,包括词法分析、语法解析和类型检查,最终生成带有完整类型标记的抽象语法树
  • 下篇:深入代码生成阶段,利用Moonbit官方的llvm.mbt​绑定库,将语法树转换为LLVM中间表示,并最终生成RISC-V汇编代码

TinyMoonbit 语言设计

TinyMoonbit是一种系统级编程语言,其抽象层次与C语言相当。虽然在语法设计上大量借鉴了Moonbit的特性,但TinyMoonbit实际并非Moonbit语言的子集,而是一个为测试llvm.mbt​功能完备性兼具教学作用的简化版本。

注:由于篇幅限制,本系列文章所提到的TinyMoonbit实现比真正的TinyMoonbit更加简单,完整版本请参考 TinyMoonbitLLVM

核心特性

TinyMoonbit提供了现代系统编程所需的基础功能:

  • 底层内存操作:直接的指针操作和内存管理
  • 控制流结构:条件分支、循环和函数调用
  • 类型安全:静态类型检查和明确的类型声明
  • 简化设计:为降低实现复杂度,不支持类型推导和闭包等高级特性

语法示例

让我们通过一个经典的斐波那契数列实现来展示TinyMoonbit的语法:

extern fn 
(x : Int) -> Unit
print_int
(
Int
x
:
Int
Int
) ->
Unit
Unit
;
// 递归实现斐波那契数列 fn
(n : Int) -> Int
fib
(
Int
n
:
Int
Int
) ->
Int
Int
{
if
Int
n
(self_ : Int, other : Int) -> Bool
<=
1 {
return
Int
n
;
} return
(n : Int) -> Int
fib
(
Int
n
(self : Int, other : Int) -> Int

Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

Parameters:

  • self : The minuend (the number being subtracted from).
  • other : The subtrahend (the number to subtract).

Returns the difference between self and other.

Example:

  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
-
1)
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
(n : Int) -> Int
fib
(
Int
n
(self : Int, other : Int) -> Int

Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

Parameters:

  • self : The minuend (the number being subtracted from).
  • other : The subtrahend (the number to subtract).

Returns the difference between self and other.

Example:

  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
-
2);
} fn main() -> Unit {
(x : Int) -> Unit
print_int
(
(n : Int) -> Int
fib
(10));
}

编译目标

经过完整的编译流程后,上述代码将生成如下的LLVM中间表示:

; ModuleID = 'tinymoonbit'
source_filename = "tinymoonbit"

define i32 @fib(i32 %0) {
entry:
  %1 = alloca i32, align 4
  store i32 %0, ptr %1, align 4
  %2 = load i32, ptr %1, align 4
  %3 = icmp sle i32 %2, 1
  br i1 %3, label %4, label %6

4:                                                ; preds = %entry
  %5 = load i32, ptr %1, align 4
  ret i32 %5

6:                                                ; preds = %4, %entry
  %7 = load i32, ptr %1, align 4
  %8 = sub i32 %7, 1
  %9 = call i32 @fib(i32 %8)
  %10 = load i32, ptr %1, align 4
  %11 = sub i32 %10, 2
  %12 = call i32 @fib(i32 %11)
  %13 = add i32 %9, %12
  ret i32 %13
}

define void @main() {
entry:
  %0 = call i32 @fib(i32 10)
  call void @print_int(i32 %0)
}

declare void @print_int(i32 %0)

第二章:词法分析

词法分析(Lexical Analysis)构成了编译过程的第一道关卡,其核心使命是将连续的字符流转换为具有语义意义的词法单元(Tokens)序列。这个看似简单的转换过程,实际上是整个编译器流水线的基石。

从字符到符号:Token的设计与实现

考虑以下代码片段:

let 
Int
x
:
Int
Int
= 5;

经过词法分析器处理后,将产生如下的Token序列:

(Keyword "let") → (Identifier "x") → (Symbol ":") →
(Type "Int") → (Operator "=") → (IntLiteral 5) → (Symbol ";")

这个转换过程需要处理多种复杂情况:

  1. 空白符过滤:跳过空格、制表符和换行符
  2. 关键字识别:区分保留字与用户定义标识符
  3. 数值解析:正确识别整数、浮点数的边界
  4. 运算符处理:区分单字符和多字符运算符

Token类型系统设计

基于TinyMoonbit的语法规范,我们将所有可能的符号分类为以下Token类型:

pub enum Token {
  
(Bool) -> Token
Bool
(
Bool
Bool
) // 布尔值:true, false
(Int) -> Token
Int
(
Int
Int
) // 整数:1, 2, 3, ...
(Double) -> Token
Double
(
Double
Double
) // 浮点数:1.0, 2.5, 3.14, ...
(String) -> Token
Keyword
(
String
String
) // 保留字:let, if, while, fn, return
(String) -> Token
Upper
(
String
String
) // 类型标识符:首字母大写,如 Int, Double, Bool
(String) -> Token
Lower
(
String
String
) // 变量标识符:首字母小写,如 x, y, result
(String) -> Token
Symbol
(
String
String
) // 运算符和标点:+, -, *, :, ;, ->
(Char) -> Token
Bracket
(
Char
Char
) // 括号类:(, ), [, ], {, }
Token
EOF
// 文件结束标记
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
)

利用模式匹配

Moonbit的强大模式匹配能力使我们能够以一种前所未有的优雅方式实现词法分析器。与传统的有限状态自动机方法相比,这种基于模式匹配的实现更加直观和易于理解。

核心分析函数

pub fn 
(code : String) -> Array[Token]
lex
(
String
code
:
String
String
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
] {
let
Array[Token]
tokens
=
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
(capacity? : Int) -> Array[Token]

Creates a new empty array with an optional initial capacity.

Parameters:

  • capacity : The initial capacity of the array. If 0 (default), creates an array with minimum capacity. Must be non-negative.

Returns a new empty array of type Array[T] with the specified initial capacity.

Example:

  let arr : Array[Int] = Array::new(capacity=10)
  inspect(arr.length(), content="0")
  inspect(arr.capacity(), content="10")

  let arr : Array[Int] = Array::new()
  inspect(arr.length(), content="0")
new
()
loop
String
code
[:] {
// 跳过空白字符
StringView
[' ' | '\n' | '\r' | '\t', ..rest]
=>
continue
StringView
rest
// 处理单行注释
StringView
[.."//", ..rest]
=>
continue loop
StringView
rest
{
StringView
['\n' | '\r', ..rest_str]
=> break
StringView
rest_str
StringView
[_, ..rest_str]
=> continue
StringView
rest_str
StringView
[] as rest_str
=> break
StringView
rest_str
} // 识别多字符运算符(顺序很重要!)
StringView
[.."->", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("->")); continue
StringView
rest
}
StringView
[.."==", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("==")); continue
StringView
rest
}
StringView
[.."!=", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("!=")); continue
StringView
rest
}
StringView
[.."<=", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("<=")); continue
StringView
rest
}
StringView
[..">=", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
(">=")); continue
StringView
rest
}
// 识别单字符运算符和标点符号 [':' | '.' | ',' | ';' | '+' | '-' | '*' | '/' | '%' | '>' | '<' | '=' as c, ..rest] => {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("\{
Char
c
}"))
continue
StringView
rest
} // 识别括号
StringView
[
Char
'(' | ')' | '[' | ']' | '{' | '}' as c
StringView
, ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(Char) -> Token
Bracket
(
Char
c
))
continue
StringView
rest
} // 识别标识符和字面量
StringView
['a'..='z', ..] as code
=> {
let (
Token
tok
,
StringView
rest
) =
(StringView) -> (Token, StringView)
lex_ident
(
StringView
code
);
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Token
tok
)
continue
StringView
rest
} ['A'..='Z', ..] => { ... } ['0'..='9', ..] => { ... } // 到达文件末尾 [] => {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Token
EOF
); break
Array[Token]
tokens
}
} }

关键字识别策略

标识符解析需要特别处理关键字的识别:

pub fn 
(rest : StringView) -> (Token, StringView)
let_ident
(
StringView
rest
:
type StringView

StringView represents a view of a String that maintains proper Unicode character boundaries. It allows safe access to a substring while handling multi-byte characters correctly.

@string.View
) -> (
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
,
type StringView

StringView represents a view of a String that maintains proper Unicode character boundaries. It allows safe access to a substring while handling multi-byte characters correctly.

@string.View
) {
// 预定义关键字映射表 let
Unit
keyword_map
=
Unit
Map
.
(Array[(String, Token)]) -> Unit
from_array
([
("let", Token::
(String) -> Token
Keyword
("let")),
("fn", Token::
(String) -> Token
Keyword
("fn")),
("if", Token::
(String) -> Token
Keyword
("if")),
("else", Token::
(String) -> Token
Keyword
("else")),
("while", Token::
(String) -> Token
Keyword
("while")),
("return", Token::
(String) -> Token
Keyword
("return")),
("extern", Token::
(String) -> Token
Keyword
("extern")),
("true", Token::
(Bool) -> Token
Bool
(true)),
("false", Token::
(Bool) -> Token
Bool
(false)),
]) let
Array[Char]
identifier_chars
=
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
(capacity? : Int) -> Array[Char]

Creates a new empty array with an optional initial capacity.

Parameters:

  • capacity : The initial capacity of the array. If 0 (default), creates an array with minimum capacity. Must be non-negative.

Returns a new empty array of type Array[T] with the specified initial capacity.

Example:

  let arr : Array[Int] = Array::new(capacity=10)
  inspect(arr.length(), content="0")
  inspect(arr.capacity(), content="10")

  let arr : Array[Int] = Array::new()
  inspect(arr.length(), content="0")
new
()
let
StringView
remaining
= loop
StringView
rest
{
StringView
[
Char
'a'..='z' | 'A'..='Z' | '0'..='9' | '_' as c
StringView
, ..rest_str]
=> {
Array[Char]
identifier_chars
.
(self : Array[Char], value : Char) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Char
c
)
continue
StringView
rest_str
}
StringView
_ as rest_str
=> break
StringView
rest_str
} let
String
ident
=
(ArrayView[Char]) -> String
String::
(chars : ArrayView[Char]) -> String

Convert char array to string.

  let s = @string.from_array(['H', 'e', 'l', 'l', 'o'])
  assert_eq(s, "Hello")

Do not convert large datas to Array[Char] and build a string with String::from_array.

For efficiency considerations, it's recommended to use Buffer instead.

from_array
(
Array[Char]
identifier_chars
)
let
Token
token
=
Unit
keyword_map
.
(Unit) -> Unit
get
(
Unit
identifier
).
(() -> Token) -> Token
or_else
(() => Token::
(String) -> Token
Lower
(
String
ident
))
(
Token
token
,
StringView
remaining
)
}

💡 Moonbit语法特性深度解析

上述词法分析器的实现充分展示了Moonbit在编译器开发中的几个突出优势:

  1. 函数式循环构造
loop initial_value {
  pattern1 => continue new_value1
  pattern2 => continue new_value2
  pattern3 => break final_value
}

loop​并非传统意义上的循环结构,而是一种函数式循环

  • 接受一个初始参数作为循环状态
  • 通过模式匹配处理不同情况
  • continue​传递新状态到下一次迭代
  • break​终止循环并返回最终值
  1. 字符串视图与模式匹配

Moonbit的字符串模式匹配功能极大简化了文本处理:

// 匹配单个字符
['a', ..rest] => // 以字符'a'开头

// 匹配字符范围
['a'..='z' as c, ..rest] => // 小写字母,绑定到变量c

// 匹配字符串字面量
[.."hello", ..rest] => // 等价于 ['h','e','l','l','o', ..rest]

// 匹配多个可能的字符
[' ' | '\t' | '\n', ..rest] => // 任意空白字符
  1. 模式匹配优先级的重要性

⚠️ 重要提醒:匹配顺序至关重要

在编写模式匹配规则时,必须将更具体的模式放在更一般的模式之前。例如:

// ✅ 正确的顺序
loop code[:] {
  [.."->", ..rest] => { ... }     // 先匹配多字符运算符
  ['-' | '>' as c, ..rest] => { ... }  // 再匹配单字符
}

// ❌ 错误的顺序 - "->"将永远无法被匹配
loop code[:] {
  ['-' | '>' as c, ..rest] => { ... }
  [.."->", ..rest] => { ... }     // 永远不会执行
}

通过这种基于模式匹配的方法,我们不仅避免了复杂的状态机实现,还获得了更清晰、更容易维护的代码结构。


第三章:语法分析与抽象语法树构建

语法分析(Syntactic Analysis)是编译器的第二个核心阶段,其任务是将词法分析产生的Token序列重新组织为具有层次结构的抽象语法树(Abstract Syntax Tree, AST)。这个过程不仅要验证程序是否符合语言的语法规则,更要为后续的语义分析和代码生成提供结构化的数据表示。

抽象语法树设计:程序的结构化表示

在构建语法分析器之前,我们需要精心设计AST的结构。这个设计决定了如何表示程序的语法结构,以及后续编译阶段如何处理这些结构。

1. 核心类型系统

首先,我们定义TinyMoonbit类型系统在AST中的表示:

pub enum Type {
  
Type
Unit
// 单位类型,表示无返回值
Type
Bool
// 布尔类型:true, false
Type
Int
// 32位有符号整数
Type
Double
// 64位双精度浮点数
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait ToJson {
  to_json(Self) -> Json
}

Trait for types that can be converted to Json

ToJson
)
pub fn
(type_name : String) -> Type
parse_type
(
String
type_name
:
String
String
) ->
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
{
match
String
type_name
{
"Unit" => Type::
Type
Unit
"Bool" => Type::
Type
Bool
"Int" => Type::
Type
Int
"Double" => Type::
Type
Double
_ =>
(msg : String) -> Type

Aborts the program with an error message. Always causes a panic, regardless of the message provided.

Parameters:

  • message : A string containing the error message to be displayed when aborting.

Returns a value of type T. However, this function never actually returns a value as it always causes a panic.

abort
("Unknown type: \{
String
type_name
}")
} }

2. 分层的AST节点设计

我们采用分层设计来清晰地表示程序的不同抽象层次:

  1. 原子表达式(AtomExpr) 代表不可再分解的基本表达式单元:
pub enum AtomExpr {
  
(Bool) -> AtomExpr
Bool
(
Bool
Bool
) // 布尔字面量
(Int) -> AtomExpr
Int
(
Int
Int
) // 整数字面量
(Double) -> AtomExpr
Double
(
Double
Double
) // 浮点数字面量
(String, ty~ : Type?) -> AtomExpr
Var
(
String
String
, mut
Type?
ty
~ :
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
?) // 变量引用
(Expr, ty~ : Type?) -> AtomExpr
Paren
(
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
, mut
Type?
ty
~ :
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
?) // 括号表达式
(String, Array[Expr], ty~ : Type?) -> AtomExpr
Call
(
String
String
,
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
], mut
Type?
ty
~ :
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
?) // 函数调用
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait ToJson {
  to_json(Self) -> Json
}

Trait for types that can be converted to Json

ToJson
)
  1. 复合表达式(Expr) 可以包含运算符和多个子表达式的更复杂结构:
pub enum Expr {
  
(AtomExpr, ty~ : Type?) -> Expr
AtomExpr
(
enum AtomExpr {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Var(String, ty~ : Type?)
  Paren(Expr, ty~ : Type?)
  Call(String, Array[Expr], ty~ : Type?)
} derive(Show, Eq, ToJson)
AtomExpr
, mut
Type?
ty
~ :
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
?) // 原子表达式包装
(String, Expr, ty~ : Type?) -> Expr
Unary
(
String
String
,
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
, mut
Type?
ty
~ :
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
?) // 一元运算:-, !
(String, Expr, Expr, ty~ : Type?) -> Expr
Binary
(
String
String
,
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
,
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
, mut
Type?
ty
~ :
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
?) // 二元运算:+, -, *, /, ==, !=, 等
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait ToJson {
  to_json(Self) -> Json
}

Trait for types that can be converted to Json

ToJson
)
  1. 语句(Stmt) 代表程序中的可执行单元:
pub enum Stmt {
  
(String, Type, Expr) -> Stmt
Let
(
String
String
,
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
,
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
) // 变量声明:let x : Int = 5;
(String, Expr) -> Stmt
Assign
(
String
String
,
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
) // 赋值语句:x = 10;
(Expr, Array[Stmt], Array[Stmt]) -> Stmt
If
(
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
,
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt
],
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt
]) // 条件分支:if-else
(Expr, Array[Stmt]) -> Stmt
While
(
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
,
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt
]) // 循环语句:while
(Expr?) -> Stmt
Return
(
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
?) // 返回语句:return expr;
(Expr) -> Stmt
Expr
(
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
) // 单表达式语句
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait ToJson {
  to_json(Self) -> Json
}

Trait for types that can be converted to Json

ToJson
)
  1. 顶层结构 函数定义和完整程序:
pub struct Function {
  
String
name
:
String
String
// 函数名
Array[(String, Type)]
params
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[(
String
String
,
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
)] // 参数列表:[(参数名, 类型)]
Type
ret_ty
:
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
// 返回类型
Array[Stmt]
body
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt
] // 函数体语句序列
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait ToJson {
  to_json(Self) -> Json
}

Trait for types that can be converted to Json

ToJson
)
// 程序定义为函数名到函数定义的映射 pub type Program
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[
String
String
,
struct Function {
  name: String
  params: Array[(String, Type)]
  ret_ty: Type
  body: Array[Stmt]
} derive(Show, Eq, ToJson)
Function
]

设计要点:类型标记的可变性

注意到每个表达式节点都包含一个 mut ty~ : Type?​ 字段。这个设计允许我们在类型检查阶段填充类型信息,而不需要重新构建整个AST。

递归下降解析:自顶向下的构建策略

递归下降(Recursive Descent)是一种自顶向下的语法分析方法,其核心思想是为每个语法规则编写一个对应的解析函数。在Moonbit中,模式匹配使这种方法的实现变得异常优雅。

解析原子表达式

pub fn 
(tokens : ArrayView[Token]) -> (AtomExpr, ArrayView[Token]) raise
parse_atom_expr
(
ArrayView[Token]
tokens
:
#builtin.valtype
type ArrayView[T]

An ArrayView represents a view into a section of an array without copying the data.

Example

  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]
) -> (
enum AtomExpr {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Var(String, ty~ : Type?)
  Paren(Expr, ty~ : Type?)
  Call(String, Array[Expr], ty~ : Type?)
} derive(Show, Eq, ToJson)
AtomExpr
,
#builtin.valtype
type ArrayView[T]

An ArrayView represents a view into a section of an array without copying the data.

Example

  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) raise {
match
ArrayView[Token]
tokens
{
// 解析字面量
ArrayView[Token]
[
(Bool) -> Token
Bool
ArrayView[Token]
(
Bool
b
ArrayView[Token]
), ..rest]
=> (AtomExpr::
(Bool) -> AtomExpr
Bool
(
Bool
b
),
ArrayView[Token]
rest
)
ArrayView[Token]
[
(Int) -> Token
Int
ArrayView[Token]
(
Int
i
ArrayView[Token]
), ..rest]
=> (AtomExpr::
(Int) -> AtomExpr
Int
(
Int
i
),
ArrayView[Token]
rest
)
ArrayView[Token]
[
(Double) -> Token
Double
ArrayView[Token]
(
Double
d
ArrayView[Token]
), ..rest]
=> (AtomExpr::
(Double) -> AtomExpr
Double
(
Double
d
),
ArrayView[Token]
rest
)
// 解析函数调用:func_name(arg1, arg2, ...)
ArrayView[Token]
[
(String) -> Token
Lower
ArrayView[Token]
(
String
func_name
ArrayView[Token]
),
(Char) -> Token
Bracket
ArrayView[Token]
('('), ..rest]
=> {
let (
Array[Expr]
args
,
Unit
rest
) =
(ArrayView[Token]) -> (Array[Expr], Unit)
parse_argument_list
(
ArrayView[Token]
rest
)
match
Unit
rest
{
Unit
[
(Char) -> _/0
Bracket
Unit
(')'), ..remaining]
=>
(AtomExpr::
(String, Array[Expr], ty~ : Type?) -> AtomExpr
Call
(
String
func_name
,
Array[Expr]
args
,
Type?
ty
=
Type?
None
),
ArrayView[Token]
remaining
)
_ => raise
Error
SyntaxError
("Expected ')' after function arguments")
} } // 解析变量引用
ArrayView[Token]
[
(String) -> Token
Lower
ArrayView[Token]
(
String
var_name
ArrayView[Token]
), ..rest]
=>
(AtomExpr::
(String, ty~ : Type?) -> AtomExpr
Var
(
String
var_name
,
Type?
ty
=
Type?
None
),
ArrayView[Token]
rest
)
// 解析括号表达式:(expression)
ArrayView[Token]
[
(Char) -> Token
Bracket
ArrayView[Token]
('('), ..rest]
=> {
let (
Expr
expr
,
ArrayView[Token]
rest
) =
(tokens : ArrayView[Token]) -> (Expr, ArrayView[Token]) raise
parse_expression
(
ArrayView[Token]
rest
)
match
ArrayView[Token]
rest
{
ArrayView[Token]
[
(Char) -> Token
Bracket
ArrayView[Token]
(')'), ..remaining]
=>
(AtomExpr::
(Expr, ty~ : Type?) -> AtomExpr
Paren
(
Expr
expr
,
Type?
ty
=
Type?
None
),
ArrayView[Token]
remaining
)
_ => raise
Error
SyntaxError
("Expected ')' after expression")
} } _ => raise
Error
SyntaxError
("Expected atomic expression")
} }

解析语句

语句解析需要根据开头的关键字分发到不同的处理函数:

pub fn 
(tokens : ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_stmt
(
ArrayView[Token]
tokens
:
#builtin.valtype
type ArrayView[T]

An ArrayView represents a view into a section of an array without copying the data.

Example

  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) -> (
enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt
,
#builtin.valtype
type ArrayView[T]

An ArrayView represents a view into a section of an array without copying the data.

Example

  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) {
match
ArrayView[Token]
tokens
{
// 解析let语句 [
(String) -> Token
Keyword
("let"),
(String) -> Token
Lower
(
String
var_name
),
(String) -> Token
Symbol
(":"), ..] => { /* ... */ }
// 解析if/while/return语句
ArrayView[Token]
[
(String) -> Token
Keyword
ArrayView[Token]
("if"), .. rest]
=>
(ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_if_stmt
(
ArrayView[Token]
rest
)
ArrayView[Token]
[
(String) -> Token
Keyword
ArrayView[Token]
("while"), .. rest]
=>
(ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_while_stmt
(
ArrayView[Token]
rest
)
ArrayView[Token]
[
(String) -> Token
Keyword
ArrayView[Token]
("return"), .. rest]
=> { /* ... */ }
// 解析赋值语句
ArrayView[Token]
[
(String) -> Token
Lower
ArrayView[Token]
(_),
(String) -> Token
Symbol
ArrayView[Token]
("="), .. rest]
=>
(ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_assign_stmt
(
ArrayView[Token]
tokens
)
// 解析单表达式语句
ArrayView[Token]
[
(String) -> Token
Lower
ArrayView[Token]
(_),
(String) -> Token
Symbol
ArrayView[Token]
("="), .. rest]
=>
(ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_single_expr_stmt
(
ArrayView[Token]
tokens
)
_ => { /* 错误处理 */ } } }

难点:处理运算符优先级:

表达式解析中最复杂的部分是处理运算符优先级,我们需要确保1 + 2 _ 3被正确解析为1 + (2 _ 3)而不是(1 + 2) * 3。

💡 Moonbit高级特性应用

自动派生功能

pub enum Expr {
  // ...
} derive(Show, Eq, ToJson)

Moonbit的 derive​ 功能自动为类型生成常用的实现,这里我们使用三个:

  • Show​:提供调试输出功能
  • Eq​:支持相等性比较
  • ToJson​:序列化为JSON格式,便于调试和持久化

这些自动生成的功能在编译器开发中极为有用,特别是在调试和测试阶段。

错误处理机制

pub fn 
(tokens : ArrayView[Token]) -> (Expr, ArrayView[Token]) raise
parse_expression
(
ArrayView[Token]
tokens
:
#builtin.valtype
type ArrayView[T]

An ArrayView represents a view into a section of an array without copying the data.

Example

  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) -> (
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
,
#builtin.valtype
type ArrayView[T]

An ArrayView represents a view into a section of an array without copying the data.

Example

  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) raise {
// raise关键字表示此函数可能抛出异常 }

Moonbit的 raise​ 机制提供了结构化的错误处理,使得语法错误能够被准确定位和报告。

通过这种分层设计和递归下降的解析策略,我们构建了一个既灵活又高效的语法分析器,为后续的类型检查阶段奠定了坚实的基础。


第四章:类型检查与语义分析

语义分析是编译器设计中承上启下的关键阶段。虽然语法分析确保了程序结构的正确性,但这并不意味着程序在语义上是有效的。类型检查作为语义分析的核心组成部分,负责验证程序中所有操作的类型一致性,确保类型安全和运行时的正确性。

作用域管理:构建环境链

类型检查面临的首要挑战是正确处理变量的作用域(Scope)。在程序的不同层次(全局、函数、块级别),同一个变量名可能指向不同的实体。我们采用环境链(Environment Chain)的经典设计来解决这个问题:

pub struct TypeEnv[K, V] {
  
TypeEnv[K, V]?
parent
:
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv
[

type parameter K

K
,

type parameter V

V
]? // 指向父环境的引用
Map[K, V]
data
:
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[

type parameter K

K
,

type parameter V

V
] // 当前环境的变量绑定
}

环境链的核心是变量查找算法,它遵循词法作用域的规则:

pub fn 
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv
::
(self : TypeEnv[K, V], key : K) -> V?
get
[K :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
+
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
, V](
TypeEnv[K, V]
self
:
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
Self
[

type parameter K

K
,

type parameter V

V
],
K
key
:

type parameter K

K
) ->

type parameter V

V
? {
match
TypeEnv[K, V]
self
.
Map[K, V]
data
.
(self : Map[K, V], key : K) -> V?

Retrieves the value associated with a given key in the hash map.

Parameters:

  • self : The hash map to search in.
  • key : The key to look up in the map.

Returns Some(value) if the key exists in the map, None otherwise.

Example:

  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get
(
K
key
) {
(V) -> V?
Some
(
V
value
) =>
(V) -> V?
Some
(
V
value
) // 在当前环境中找到
V?
None
=>
match
TypeEnv[K, V]
self
.
TypeEnv[K, V]?
parent
{
(TypeEnv[K, V]) -> TypeEnv[K, V]?
Some
(
TypeEnv[K, V]
parent_env
) =>
TypeEnv[K, V]
parent_env
.
(self : TypeEnv[K, V], key : K) -> V?
get
(
K
key
) // 递归查找父环境
TypeEnv[K, V]?
None
=>
V?
None
// 到达顶层环境,变量未定义
} } }

设计原则:词法作用域

这种设计确保了变量的查找遵循词法作用域规则:

  1. 首先在当前作用域中查找
  2. 如果未找到,向上层作用域递归查找
  3. 直到找到变量或到达全局作用域

类型检查器架构

单纯的环境管理还不足以完成类型检查任务。某些操作(如函数调用)需要访问全局的程序信息。因此,我们设计了一个综合的类型检查器:

pub struct TypeChecker {
  
TypeEnv[String, Type]
local_env
:
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv
[
String
String
,
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
] // 本地变量环境
Function
current_func
:
struct Function {
  name: String
  params: Array[(String, Type)]
  ret_ty: Type
  body: Array[Stmt]
} derive(Show, Eq, ToJson)
Function
// 当前检查的函数
Program
program
:
type Program Map[String, Function]
Program
// 完整的程序信息
}

部分节点类型检查的实现

类型检查器的核心是对不同AST节点应用相应的类型规则。以下是表达式类型检查的实现:

pub fn 
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
::
(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type
(
Expr
self
:
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Self
,
TypeEnv[String, Type]
env
:
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv
[
String
String
,
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
]
) ->
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
raise {
match
Expr
self
{
// 原子表达式的类型检查
(AtomExpr, ty~ : Type?) -> Expr
AtomExpr
Expr
(
AtomExpr
atom_expr
Expr
, ..) as node
=> {
let
Type
ty
=
AtomExpr
atom_expr
.
(TypeEnv[String, Type]) -> Type
check_type
(
TypeEnv[String, Type]
env
)
Expr
node
Unit
.ty =
(Type) -> Type?
Some
Unit
(
Type
ty
Unit
)
// 填充类型信息
Type
ty
} // 一元运算的类型检查
(String, Expr, ty~ : Type?) -> Expr
Unary
Expr
("-",
Expr
expr
Expr
, ..) as node
=> {
let
Type
ty
=
Expr
expr
.
(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type
(
TypeEnv[String, Type]
env
)
Expr
node
Unit
.ty =
(Type) -> Type?
Some
Unit
(
Type
ty
Unit
)
Type
ty
} // 二元运算的类型检查
(String, Expr, Expr, ty~ : Type?) -> Expr
Binary
Expr
(""+,
Expr
lhs
Expr
,
Expr
rhs
Expr
, ..) as node
=> {
let
Type
lhs_type
=
Expr
lhs
.
(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type
(
TypeEnv[String, Type]
env
)
let
Type
rhs_type
=
Expr
rhs
.
(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type
(
TypeEnv[String, Type]
env
)
// 确保操作数类型一致 guard
Type
lhs_type
(Type, Type) -> Bool

automatically derived

==
Type
rhs_type
else {
raise
Error
TypeCheckError
(
"Binary operation requires matching types, got \{
Type
lhs_type
} and \{
Type
rhs_type
}"
) } let
Type
result_type
= match
String
op
{
// 比较运算符总是返回布尔值 "==" | "!=" | "<" | "<=" | ">" | ">=" => Type::
Type
Bool
// 算术运算符等保持操作数类型 _ =>
Type
lhs_type
}
Expr
node
Unit
.ty =
(Type) -> Type?
Some
Unit
(
Type
result_type
Unit
)
Type
result_type
} } }

** 💡 Moonbit枚举修改技巧 **

在类型检查过程中,我们需要为AST节点填充类型信息。Moonbit提供了一种优雅的方式来修改枚举变体的可变字段:

pub enum Expr {
  AtomExpr(AtomExpr, mut ty~ : Type?)
  Unary(String, Expr, mut ty~ : Type?)
  Binary(String, Expr, Expr, mut ty~ : Type?)
} derive(Show, Eq, ToJson)

通过在模式匹配中使用 as​ 绑定,我们可以获得对枚举变体的引用并修改其可变字段:

match expr {
  AtomExpr(atom_expr, ..) as node => {
    let 
?
ty
=
Unit
atom_expr
.
(Unit) -> ?
check_type
(
Unit
env
)
node.ty = Some(ty) // 修改可变字段 ty } // ... }

这种设计避免了重新构建整个AST的开销,同时保持了函数式编程的风格。


完整编译流程展示

经过词法分析、语法分析和类型检查三个阶段,我们的编译器前端已经能够将源代码转换为完全类型化的抽象语法树。让我们通过一个简单的例子来展示完整的过程:

源代码示例

fn 
(x : Int, y : Int) -> Int
add
(
Int
x
:
Int
Int
,
Int
y
:
Int
Int
) ->
Int
Int
{
return
Int
x
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Int
y
;
}

编译输出:类型化AST

利用 derive(ToJson)​ 功能,我们可以将最终的AST输出为JSON格式进行查看:

{
  "functions": {
    "add": {
      "name": "add",
      "params": [
        ["x", { "$tag": "Int" }],
        ["y", { "$tag": "Int" }]
      ],
      "ret_ty": { "$tag": "Int" },
      "body": [
        {
          "$tag": "Return",
          "0": {
            "$tag": "Binary",
            "0": "+",
            "1": {
              "$tag": "AtomExpr",
              "0": {
                "$tag": "Var",
                "0": "x",
                "ty": { "$tag": "Int" }
              },
              "ty": { "$tag": "Int" }
            },
            "2": {
              "$tag": "AtomExpr",
              "0": {
                "$tag": "Var",
                "0": "y",
                "ty": { "$tag": "Int" }
              },
              "ty": { "$tag": "Int" }
            },
            "ty": { "$tag": "Int" }
          }
        }
      ]
    }
  }
}

从这个JSON输出中,我们可以清楚地看到:

  1. 完整的函数签名:包括参数列表和返回类型
  2. 类型标记的AST节点:每个表达式都携带了类型信息
  3. 结构化的程序表示:为后续的代码生成阶段提供了清晰的数据结构

结语

通过本篇文章,我们深入探讨了编译器前端的完整实现流程。从字符流到类型化的抽象语法树,我们见证了Moonbit语言在编译器构建中的独特优势:

核心收获

  1. 模式匹配的威力:Moonbit的字符串模式匹配和结构化模式匹配极大简化了词法分析和语法分析的实现
  2. 函数式编程范式loop​构造、环境链和不可变数据结构的结合,提供了既优雅又高效的解决方案
  3. 类型系统的表达力:通过枚举的可变字段和trait对象,我们能够构建既类型安全又灵活的数据结构
  4. 工程化特性derive​功能、结构化错误处理和JSON序列化等特性,大大提升了开发效率

展望下篇

在掌握了语法前端的实现之后,下篇文章将引导我们进入更加激动人心的代码生成阶段。我们将:

  • 深入了解LLVM中间表示的设计哲学
  • 探索Moonbit官方llvm.mbt​绑定库的使用方法
  • 实现从AST到LLVM IR的完整转换
  • 生成可执行的RISC-V汇编代码

编译器的构建是一个复杂而富有挑战性的过程,但正如我们在本篇中所展示的,Moonbit为这个过程提供了强大而优雅的工具。让我们在下篇中继续这段令人兴奋的编译器构建之旅。

资源推荐