使用 MoonBit 开发一个 HTTP 文件服务器

2025年10月22日 · 阅读需 17 分钟

在这篇文章中，我将会介绍如何使用 MoonBit 的异步编程功能和 moonbitlang/async 库，编写一个简单的 HTTP 文件服务器。如果你之前接触过 Python 语言，那么你可能知道，Python 有一个非常方便的内建 HTTP 服务器模块。只需要运行 python -m http.server，就能在当期文件夹启动一个文件服务器，用于局域网文件共享等用途。在这篇文章中，我们将用 MoonBit 实现一个类似功能的程序，并借此了解 MoonBit 的异步编程支持。我们还将额外支持一个 python -m http.server 没有的实用功能：把整个文件夹打包成 zip 文件下载。

异步编程简史

异步编程，能让程序具有同时处理多项任务的能力。例如，对于一个文件服务器来说，可能会有多个用户同时访问这个服务器，而服务器需要同时服务所有用户，让它们的体验尽可能流畅、低延时。在典型的异步程序，例如服务器中，每项任务的大部分时间都花在等待 IO 上，实际的计算时间占比较低。因此，我们并不需要很多的计算资源，也能同时处理大量任务。而这其中的诀窍，就是频繁地在多个任务之间切换：如果某项任务开始等待 IO，那么就不要继续处理它，而是马上切换到不需要等待的任务上。

过去，异步程序往往是通过多线程的方式实现的：每项任务对应一个操作系统的线程。然而，操作系统线程需要占用较多资源，而且在线程之间切换开销较大。因此，进入 21 世纪后，实现异步程序的主要方式变成了事件循环。整个异步程序的形态是一个巨大的循环，每次循环中，程序检查哪些 IO 操作已经完成，然后运行那些等待着这些已完成的 IO 操作的任务，直到它们发起下一次 IO 请求，重新进入等待状态。在这种编程范式中，任务间的切换发生在同一个用户态的线程里，因此开销极低。

然而，手写事件循环是一件非常痛苦的事情。因为同一个任务的代码会被拆散到多次不同的循环中执行，程序的逻辑变得不连贯了。因此，基于事件循环的程序非常难编写和调试。幸运的是，就像大部分其他现代编程语言一样，MoonBit 提供了原生的异步编程支持。用户可以像写同步程序一样写异步代码，MoonBit 会自动把异步代码切分成不同的部分。而 moonbitlang/async 库则提供了事件循环和各种 IO 原语的实现，负责把异步代码运行起来。

MoonBit 中的异步编程

在 MoonBit 中，可以用 async fn 语法来声明一个异步函数。异步函数看上去和同步函数完全一样，只不过它们在运行时可能在中途被打断，一段时间后才继续恢复运行，从而实现多个任务间的切换。在异步函数中可以正常使用循环等控制流构造，MoonBit 编译器会自动将它们变成异步的样子。

和许多其他语言不同，在调用异步函数时，MoonBit 不需要用 await 之类的特殊语法标记，编译器会自动推断出哪些函数调用是异步的。不过，如果你使用带有 MoonBit 支持的 IDE 或文本编辑器查看代码，就会看到异步函数调用被渲染成了斜体、可能抛出错误的函数调用带有下划线。因此，阅读代码时，依然可以一眼就找到所有异步的函数调用。

对于异步程序来说，另一个必不可少的组件是事件循环、任务调度和各种 IO 原语的实现。这一点在 MoonBit 中是通过 moonbitlang/async 库实现的。 moonbitlang/async 库中提供了网络IO、文件IO、进程创建等异步操作的支持，以及一系列管理异步编程任务的 API。接下来，我们将会在编写 HTTP 文件服务器的途中介绍 moonbitlang/async 的各种功能。

HTTP 服务器的骨架

典型的 HTTP 服务器的结构是：

服务器监听一个 TCP 端口，等待来自用户的连接请求
接受来自用户的 TCP 连接后，服务器从 TCP 连接中读取用户的请求，处理用户的请求并将结果发回给用户

这里的每一项任务，都应该异步地进行：在处理第一个用户的请求时，服务器仍应不断等待新的连接，并第一时间响应下一个用户的连接请求。如果有多个用户同时连接到服务器，服务器应该同时处理所有用户的请求。在这个过程中，所有可能耗费较多时间的操作，例如网络 IO 和文件 IO，都应该是异步的，它们不应该阻塞程序、影响其他任务的处理。

在 moonbitlang/async 中，有一个辅助函数 @http.run_server，能够绑我们自动完成上述工作，搭建一个 HTTP 服务器并运行它：

async fn async (path~ : String, port~ : Int) -> Unit
server_main(String
path~ : String
String, Int
port~ : Int
Int) -> Unit
Unit {
  (Unit, (?, Unit) -> Unit) -> Unit
@http.run_server((String) -> Unit
@socket.Addr::parse("[::]:\{Int
port}"), fn (?
conn, Unit
addr) {
    Unit
@pipe.stderr.(String) -> Unit
write("received new connection from \{Unit
addr}\n")
    async (base_path : String, conn : ?) -> Unit
handle_connection(String
path, ?
conn)
  })
}

server_main 接受两个参数，其中， path 是文件服务器工作的路径，port 是服务器监听的端口。在 moonbitlang/async 中，一切异步代码都是可以取消的，而异步代码被取消时会抛出错误，所以所有异步函数都会抛出错误。因此，在 MoonBit 中，async fn 默认就会抛出错误，无需再显式标注 raise。

在 server_main 中，我们使用 @http.run_server 创建了一个 HTTP 服务器并运行它。 @http 是 moonbitlang/async 中提供 HTTP 解析等支持的包 moonbitlang/async/http 的别名， @http.run_server 的第一个参数是服务器要监听的地址。这里我们提供的地址是 [::]:port，这表示监听端口 port、接受来自任何网络接口的连接请求。 moonbitlang/async 有原生的 IPv4/IPv6 双栈支持，因此这里的服务器可以同时接受 IPv4 连接和 IPv6 连接。 @http.run_server 的第二个参数是一个回调函数，用于处理来自用户的连接。回调函数会接受两个参数，第一个是来自用户的连接，类型是 @http.ServerConnection，由 @http.run_server 自动获取并创建。第二个参数是用户的网络地址。这里，我们使用 handle_connection 函数来处理用户的请求，这个函数的实现将在稍后给出。 @http.run_server 会自动创建一个并行的任务，并在其中运行 handle_connection。因此，服务器可以同时运行多份 handle_connection、处理多个连接。

处理用户来自用户的请求

接下来，我们开始实现实际处理用户请求的 handle_connection 函数。 handle_connection 接受两个参数，base_path 是文件服务器处理的路径，而 conn 是来自用户的连接。

async fn async (base_path : String, conn : ?) -> Unit
handle_connection(
  String
base_path : String
String,
  ?
conn : @http.ServerConnection,
) -> Unit
Unit {
  for {
    let Unit
request = ?
conn.() -> Unit
read_request()
    ?
conn.() -> Unit
skip_request_body()
    guard Unit
request.Unit
meth is Unit
Get else {
      ?
conn
      ..(Int, String) -> Unit
send_response(501, "Not Implemented")
      ..(String) -> Unit
write("This request is not implemented")
      ..() -> Unit
end_response()
    }
    let (String
path, Bool
download_zip) = match Unit
request.String
path {
      String
[ ..path, .."?download_zip" ] => (StringView
path.(self : StringView) -> String
Returns a new String containing a copy of the characters in this view.
Examples
  let str = "Hello World"
  let view = str.view(start_offset = str.offset_of_nth_char(0).unwrap(),end_offset = str.offset_of_nth_char(5).unwrap()) // "Hello"
  inspect(view.to_string(), content="Hello")
to_string(), true)
      String
path => (String
path, false)
    }
    if Bool
download_zip {
      async (conn : ?, path : String) -> Unit
serve_zip(?
conn, String
base_path (self : String, other : String) -> String
Concatenates two strings, creating a new string that contains all characters
from the first string followed by all characters from the second string.
Parameters:

self : The first string to concatenate.
other : The second string to concatenate.
Returns a new string containing the concatenation of both input strings.
Example:
  let hello = "Hello"
  let world = " World!"
  inspect(hello + world, content="Hello World!")
  inspect("" + "abc", content="abc") // concatenating with empty string
+ String
path)
    } else {
      let ?
file = (String, Unit) -> ?
@fs.open(String
base_path (self : String, other : String) -> String
Concatenates two strings, creating a new string that contains all characters
from the first string followed by all characters from the second string.
Parameters:

self : The first string to concatenate.
other : The second string to concatenate.
Returns a new string containing the concatenation of both input strings.
Example:
  let hello = "Hello"
  let world = " World!"
  inspect(hello + world, content="Hello World!")
  inspect("" + "abc", content="abc") // concatenating with empty string
+ String
path, Unit
mode=Unit
ReadOnly) catch {
        _ => {
          ?
conn
          ..(Int, String) -> Unit
send_response(404, "NotFound")
          ..(String) -> Unit
write("File not found")
          ..() -> Unit
end_response()
          continue
        }
      }
      defer ?
file.() -> Unit
close()
      if ?
file.() -> Unit
kind() is Unit
Directory {
        if Bool
download_zip {
        } else {
          async (conn : ?, dir : ?, path~ : String) -> Unit
serve_directory(?
conn, ?
file.() -> ?
as_dir(), String
path~)
        }
      } else {
        async (conn : ?, file : ?, path~ : String) -> Unit
server_file(?
conn, ?
file, String
path~)
      }
    }
  }
}

在 handle_connection 中，程序通过一个大循环来不断从连接中读取用户请求并处理。每次循环中，我们首先通过 conn.read_request() 读取一个来自用户的请求。 conn.read_request() 只会读取 HTTP 请求的头部，这是为了允许用户流式地读取较大的 body。由于我们的文件服务器只处理 Get 请求，我们不需要请求的 body 中包含任何信息。因此，我们通过 conn.skip_body() 跳过用户请求的 body，以保证下一个请求的内容可以被正确读取。

接下来，如果遇到不是 Get 的请求，guard 语句的 else 块会被执行，此时，guard 语句后面的代码会被跳过，我们可以进入下一次循环、处理下一个请求。在 else 块中，通过 conn.send_response(..) 向用户发送一个 “不支持该请求” 的回复。 conn.send_response(..) 会发送回复的头部，这之后，我们用 conn.write(..) 向连接写入回复的主体内容。在写完所有内容后，我们需要用 conn.end_response() 来表明已经写完了回复的所有内容。

这里，我们希望实现一个 python -m http.server 中没有的实用功能：以 zip 的形式下载整个文件夹。如果用户请求的 URL 的形式是 /path/to/directory?download_zip，我们就把 /path/to/directory 打包成 .zip 文件发送给用户。这一功能是通过 serve_zip 函数来实现的。

由于我们实现的是一个文件服务器，用户的 GET 请求中指定的路径会直接映射到 base_path 下对应的路径。 @fs 是 moonbitlang/async 中提供文件 IO 支持的包 moonbitlang/async/fs 的别名。这里我们使用 @fs.open 打开对应的文件。如果打开文件失败了，我们向用户发送一个 404 回复，告诉用户这个文件不存在。

如果用户请求的文件是存在的，那么我们需要把文件发送给用户。当然，在此之前，别忘了用 defer file.close() 保证 file 占用的资源被及时释放。通过 file.kind()，我们可以获得文件的种类。在文件服务器中，如果用户请求的路径是一个文件夹，我们需要进行特殊的处理。因为文件夹不能直接被发送给用户，我们需要根据文件夹的内容，向用户返回一个 HTML 页面，让用户可以从页面看到文件夹里有哪些文件，并通过点击跳转到对应的页面。这部分功能通过函数 serve_directory 提供。如果用户请求的是一个普通文件，那么直接将文件的内容传输给用户即可。这部分功能通过函数 serve_file 来实现。

向用户发送一个普通文件的代码如下：

async fn async (conn : ?, file : ?, path~ : String) -> Unit
server_file(
  ?
conn : @http.ServerConnection,
  ?
file : @fs.File,
  String
path~ : String
String,
) -> Unit
Unit {
  let String
content_type = match String
path {
    [.., .. ".png"] => "image/png"
    [.., .. ".jpg"] | "jpeg" => "image/jpeg"
    [.., .. ".html"] => "text/html"
    [.., .. ".css"] => "text/css"
    [.., .. ".js"] => "text/javascript"
    [.., .. ".mp4"] => "video/mp4"
    [.., .. ".mpv"] => "video/mpv"
    [.., .. ".mpeg"] => "video/mpeg"
    [.., .. ".mkv"] => "video/x-matroska"
    _ => "appliaction/octet-stream"
  }
  ?
conn
  ..(Int, String, Map[String, String]) -> Unit
send_response(200, "OK", Map[String, String]
extra_headers={ "Content-Type": String
content_type })
  ..(?) -> Unit
write_reader(?
file)
  ..() -> Unit
end_response()
}

这里，在 HTTP 回复中，我们根据文件的后缀名填入了不同的 Content-Type 字段。这样一来，用户在浏览器中打开图片/视频/HTML 文件时，就可以直接预览文件的内容，而不需要先下载文件再在本地打开。对于其他文件，Content-Type 字段的值会是 application/octet-stream，这会让浏览器自动将文件下载到本地。

我们依然使用 conn.send_response 来用户发送回复。通过 extra_headers 字段我们可以在回复中加入额外的 HTTP header。回复的主体则是文件的内容。这里，conn.write_reader 会自动流式地把 file 的内容发送给用户。假设用户请求了一个视频文件并在浏览器中播放，如果我们先把整个视频文件读到内存中再发送给用户，那么用户需要等服务器读入整个视频文件之后才能收到回复，服务器的响应速度会变慢。而且，读入整个视频文件会浪费大量的内存。而通过使用 write_reader，@http.ServerConnection 会自动把文件内容切成小块分段发送，用户马上就能看到视频开始播放，占用的内存也会大大减少。

接下来，让我们实现显示文件夹的函数 serve_directory：

async fn async (conn : ?, dir : ?, path~ : String) -> Unit
serve_directory(
  ?
conn : @http.ServerConnection,
  ?
dir : @fs.Directory,
  String
path~ : String
String,
) -> Unit
Unit {
  let Unit
files = ?
dir.() -> Unit
read_all()
  Unit
files.() -> Unit
sort()
  ?
conn
  ..(Int, String, Map[String, String]) -> Unit
send_response(200, "OK", Map[String, String]
extra_headers={ "Content-Type": "text/html" })
  ..(String) -> Unit
write("<!DOCTYPE html><html><head></head><body>")
  ..(String) -> Unit
write("<h1>\{String
path}</h1>\n")
  ..(String) -> Unit
write("<div style=\"margin: 1em; font-size: 15pt\">\n")
  ..(String) -> Unit
write("<a href=\"\{String
path}?download_zip\">download as zip</a><br/><br/>\n")
  if String
path[:-1].(self : StringView, str : StringView) -> Int?
Returns the offset of the last occurrence of the given substring. If the
substring is not found, it returns None.
rev_find("/") is (Int) -> Int?
Some(Int
index) {
    let String
parent = if Int
index (self : Int, other : Int) -> Bool
Compares two integers for equality.
Parameters:

self : The first integer to compare.
other : The second integer to compare.
Returns true if both integers have the same value, false otherwise.
Example:
  inspect(42 == 42, content="true")
  inspect(42 == -42, content="false")
== 0 { "/" } else { String
path[:Int
index].(self : StringView) -> String
Returns a new String containing a copy of the characters in this view.
Examples
  let str = "Hello World"
  let view = str.view(start_offset = str.offset_of_nth_char(0).unwrap(),end_offset = str.offset_of_nth_char(5).unwrap()) // "Hello"
  inspect(view.to_string(), content="Hello")
to_string() }
    ?
conn.(String) -> Unit
write("<a href=\"\{String
parent}\">..</a><br/><br/>\n")
  }
  for Unit
file in Unit
files {
    let String
file_url = if String
path[String
path.(self : String) -> Int
Returns the number of UTF-16 code units in the string. Note that this is not
necessarily equal to the number of Unicode characters (code points) in the
string, as some characters may be represented by multiple UTF-16 code units.
Parameters:

string : The string whose length is to be determined.
Returns the number of UTF-16 code units in the string.
Example:
  inspect("hello".length(), content="5")
  inspect("🤣".length(), content="2") // Emoji uses two UTF-16 code units
  inspect("".length(), content="0") // Empty string
length() (self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:

self : The minuend (the number being subtracted from).
other : The subtrahend (the number to subtract).
Returns the difference between self and other.
Example:
  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
- 1] (x : Int, y : Int) -> Bool
!= '/' {
      "\{String
path}/\{Unit
file}"
    } else {
      "\{String
path}\{Unit
file}"
    }
    ?
conn.(String) -> Unit
write("<a href=\"\{String
file_url}\">\{Unit
file}</a><br/>\n")
  }
  ?
conn
  ..(String) -> Unit
write("</div></body></html>")
  ..() -> Unit
end_response()
}

这里，我们首先读入文件夹中的文件列表并对它们进行排序。接下来，我们根据文件夹的内容，拼出一段 HTML 页面。 HTML 页面的主体内容是文件夹中的文件，每个文件对应一个链接，上面显示着文件名，点击链接就能跳转到对应的文件。这里，我们通过 HTML 的 <a> 元素来实现这一点。如果文件夹不是根目录，那么我们在页面开头放上一个特殊的链接 ..，点击它会跳转到上一级目录。此外，页面里还有一个 download as zip 的链接，点击这个链接就能把当前文件夹打包成 zip 后下载。

实现将文件夹打包成 zip 的功能

接下来，我们实现将文件夹打包成 zip 提供给用户的功能。这里，简单起见，我们使用系统的 zip 命令。 serve_zip 函数的实现如下：

async fn async (conn : ?, path : String) -> Unit
serve_zip(
  ?
conn : @http.ServerConnection,
  String
path : String
String,
) -> Unit
Unit {
  let Unit
full_path = (String) -> Unit
@fs.realpath(String
path)
  let String
zip_name = if Unit
full_path[:].(String) -> Unit
rev_find("/") is (Int) -> Unit
Some(Int
i) {
    Unit
full_path[Int
i+1:].() -> String
to_string()
  } else {
    String
path
  }
  ((Unit) -> Unit) -> Unit
@async.with_task_group(fn(Unit
group) {
    let (Unit
we_read_from_zip, Unit
zip_write_to_us) = () -> (Unit, Unit)
@process.read_from_process()
    defer Unit
we_read_from_zip.() -> Unit
close()
    Unit
group.(() -> Unit) -> Unit
spawn_bg(fn() {
      let Int
exit_code = (String, Array[String], Unit) -> Int
@process.run(
        "zip",
        [ "-q", "-r", "-", String
path ],
        Unit
stdout=Unit
zip_write_to_us,
      )
      if Int
exit_code (x : Int, y : Int) -> Bool
!= 0 {
        (msg : String, loc~ : SourceLoc = _) -> Unit raise Failure
Raises a Failure error with a given message and source location.
Parameters:

message : A string containing the error message to be included in the
failure.
location : The source code location where the failure occurred.
Automatically provided by the compiler when not specified.
Returns a value of type T wrapped in a Failure error type.
Throws an error of type Failure with a message that includes both the
source location and the provided error message.
fail("zip failed with exit code \{Int
exit_code}")
      }
    })
    ?
conn
    ..(Int, String, Map[String, String]) -> Unit
send_response(200, "OK", Map[String, String]
extra_headers={
      "Content-Type": "application/octet-stream",
      "Content-Disposition": "filename=\{String
zip_name}.zip",
    })
    ..(Unit) -> Unit
write_reader(Unit
we_read_from_zip)
    ..() -> Unit
end_response()
  })
}

在 serve_zip 函数的开头，我们首先计算了用户下载的 .zip 文件的文件名。接下来，我们使用 @async.with_task_group 创建了一个新的任务组。任务组是 moonbitlang/async 中用于管理异步任务的核心构造，所有异步任务都必须在一个任务组中创建。在介绍 with_task_group 之前，让我们先看看 serve_zip 剩下的内容。首先，我们使用 @process.read_from_process() 创建了一个临时管道，从管道的一端写入的数据可以从另一侧读出，因此它可以用于读取一个进程的输出。这里我们把管道的写入端 zip_write_to_us 会被提供给 zip 命令，用于写入压缩的结果。而我们将从管道的读入端 we_read_from_zip 读取 zip 命令的输出，并将其发送给用户。

接下来，我们在新的任务组中创建了一个单独的任务，并在其中使用 @process.run 运行 zip 命令。 @process 是 moonbitlang/async/process 的别名，是 moonbitlang/async 中提供调用外部进程功能的包。我们向 zip 传递的参数的意义是：

-q：不要输出日志信息
-r：递归压缩整个文件夹
-：把结果写入到 stdout
path：要压缩的文件夹

在调用 @process.run 时，我们通过 stdout=zip_write_to_us，把 zip 命令的 stdout 重定向到了 zip_write_to_us，以获取 zip 的输出。相比创建一个临时文件，这么做有两个好处：

和 zip 间的数据传递完全在内存中进行，不需要进行低效的磁盘 IO
zip 一边压缩，我们可以一边像用户发送已经压缩好的部分，效率更高

@process.run 会等待 zip 结束运行，并返回 zip 命令的状态码。如果 zip 的返回值不是 0，说明 zip 失败了，我们抛出一个错误。

在调用 zip 的同时，我们继续使用 conn.send_response(..) 向用户发送回复信息。接下来，我们用 conn.write_reader(we_read_from_zip) 把 zip 的输出发送给用户。 Content-Disposition 这一 HTTP header 能让我们指定用户下载的 zip 文件的名字。

到这里，一切看上去都很合理。但为什么这里要创建一个新的任务组呢？为什么不能直接提供创建新任务的 API 呢？在编写异步程序时，有一个现象：写出在正确时行为正确的程序比较容易，但写出在出错时依然行为正确的程序很难。比如，对于 serve_zip 这个例子：

如果 zip 命令失败了我们应该怎么办？
如果数据发送到一半发生了网络错误，或者用户关闭了连接，应该怎么办？

如果 zip 命令失败了，那么整个 serve_zip 函数也应该失败。由于此时用户可能已经收到了一部分不完整的数据，我们很难再把连接恢复到正常状态，只能关闭把整个连接。如果数据发送到一半发生了网络错误，那么我们应该停止 zip 的运行。因为此时 zip 的结果已经没有用了，让它继续运行只是在浪费资源。而且在最坏的情况下，由于我们不再读取 zip 的输出，和 zip 通信用的管道可能会被填满，此时，zip 可能会永远阻塞在向管道写入的操作上，变成一个僵尸进程。

在上面的代码中，我们没有显式地写任何错误处理逻辑，但是，在出现上述错误时，我们的程序的行为却是符合预期的，而魔法就在于 @async.with_task_group 的语义，及其背后的 结构化并发 范式。 @async.with_task_group(f) 的大致语义如下：

它会创建一个新的任务组 group，并运行 f(group)
f 可以通过 group.spawn_bg(..) 等函数在 group 中创建新的任务
只有当 group 中的所有任务都完成时，with_task_group 才会返回
如果 group 中的任何一个任务失败了，那么 with_task_group 也会失败，group 中的其他任务会被自动取消

这里的最后一条，就是保证正确错误处理的行为的关键：

如果调用 zip 的任务失败了，那么错误会传播到整个任务组。向用户发送回复的主任务会自动被取消，然后错误会通过 with_task_group 自动向上传播，关闭连接
如果发送回复的主任务失败了，错误同样会传播到整个任务组。此时 @process.run 会被取消，此时它会自动向 zip 发送终止信号，结束 zip 的运行

因此，在使用 moonbitlang/async 编写异步程序时，只需要根据程序的结构在适当的位置插入任务组，剩下的错误处理的所有细节，都会由 with_task_group 自动解决。这正是 moonbitlang/async 使用的结构化并发范式的威力：通过编程范式的引导，它能让我们写出结构更清晰的异步程序，并以一种润物细无声的方式，让异步程序在出错时也能有正确的行为。

让服务器跑起来

至此，整个 HTTP 服务器的所有内容都已实现完毕，我们可以运行这个服务器了。 MoonBit 对异步代码有原生支持，可以直接用 async fn main 定义异步程序的入口，或是用 async test 直接测试异步代码。这里，我们让 HTTP 服务器运行在当前目录、向用户提供当前目录下的文件，并让它监听 8000 端口：

async test {
  async (path~ : String, port~ : Int) -> Unit
server_main(String
path=".", Int
port=8000)
}

通过 moon test moonbit_http_server.mbt.md 运行这份文档的源码，并在浏览器中打开 http://127.0.0.1:8000，即可使用我们实现的文件服务器。

关于 moonbitlang/async 的更多功能，可以参考它的 API 文档和 GitHub repo。

初探 MoonBit 中的 JavaScript 交互

2025年9月25日 · 阅读需 14 分钟

引言

在当今的软件世界中，任何一门编程语言都无法成为一座孤岛。对于 MoonBit 这样一门新兴的通用编程语言而言，若想在庞大的技术生态中茁壮成长，与现有生态系统的无缝集成便显得至关重要。

MoonBit 提供了包括 JavaScript 在内的多种编译后端，这为其对接广阔的 JavaScript 生态敞开了大门。无论是对于浏览器前端开发，还是对于 Node.js 环境下的后端应用，这种集成能力都极大地拓展了 MoonBit 的应用场景，让开发者可以在享受 MoonBit 带来的类型安全与高性能的同时，复用数以万计的现有 JavaScript 库。

在本文中，我们将以 Node.js 环境为例，一步步探索 MoonBit JavaScript FFI 的奥秘，从基础的函数调用到复杂的类型与错误处理，向你展示如何优雅地搭建连接 MoonBit 与 JavaScript 世界的桥梁。

预先准备

在正式启程之前，我们需要先为项目做好基础配置。如果还没有现成的项目，可以使用 moon new 工具创建一个新的 MoonBit 项目。

为了让 MoonBit 工具链知晓我们的目标平台是 JavaScript，我们需要在项目根目录的 moon.mod.json 文件中添加以下内容：

{
  "preferred-target": "js"
}

此项配置会告知编译器，在执行 moon build 或 moon check 等命令时，默认使用 JavaScript 后端。当然，如果你希望在命令行中临时指定，也可以通过 --target=js 参数达到同样的效果。

编译项目

完成上述配置后，只需在项目根目录下运行我们所熟悉的构建命令：

> moon build

命令执行成功后，由于我们的项目默认包含一个可执行入口，你可以在 target/js/debug/build/ 目录下找到编译产物。MoonBit 非常贴心地为我们生成了三个文件：

.js 文件：编译后的 JavaScript 源码。
.js.map 文件：用于调试的 Source Map 文件。
.d.ts 文件：TypeScript 类型声明文件，便于在 TypeScript 项目中集成。

第一个 JavaScript API 调用

MoonBit 的 FFI 设计在原则上保持了一致性。与调用 C 或其他语言类似，我们通过一个带有 extern 关键字的函数声明来定义一个外部调用：

extern "js" fn consoleLog(msg : String
String) -> Unit
Unit = "(msg) => console.log(msg)"

这行代码是 FFI 的核心。让我们来分解一下：

extern "js"：声明这是一个指向 JavaScript 环境的外部函数。
fn consoleLog(msg : String) -> Unit：这是该函数在 MoonBit 中的类型签名，它接受一个 String 类型的参数，并且返回一个单位值 (Unit)。
"(msg) => console.log(msg)"：等号右侧的字符串字面量是这段 FFI 的“灵魂”，其中需要包含一段原生 JavaScript 函数。

在这里，我们使用了一个简洁的箭头函数。 MoonBit 编译器会按原样将这段代码嵌入到最终生成的 .js 文件中，从而实现从 MoonBit 到 JavaScript 的调用。

提示如果你的 JavaScript 代码片段比较复杂，可以使用 #| 语法来定义多行字符串，以提高可读性。

一旦这个 FFI 声明就绪，我们就可以在 MoonBit 代码中像调用普通函数一样调用 consoleLog 了：

test "hello" {
  (msg : String) -> Unit
consoleLog("Hello from JavaScript!")
}

运行 moon test，你将会在控制台看到由 JavaScript console.log 打印出的信息。我们的第一座桥梁已经成功搭建！

JavaScript 类型的对接

打通调用流程只是第一步，真正的挑战在于如何处理两种语言之间的类型差异。 MoonBit 是一门静态类型语言，而 JavaScript 则是动态类型语言。如何在这两者之间建立安全可靠的类型映射，是 FFI 设计中需要重点考虑的问题。

下面，我们从易到难，分情况介绍如何在 MoonBit 中对接不同的 JavaScript 类型。

无需转换的 JavaScript 类型

最简单的情况是，MoonBit 中的某些类型在编译到 JavaScript 后端时，其底层实现本身就是对应的原生 JavaScript 类型。在这种情况下，我们可以直接进行传递，无需任何转换。

常见的“零成本”对接类型如下表所示：

MoonBit 类型	JavaScript 对应类型
`String`	`string`
`Bool`	`boolean`
`Int`, `UInt`, `Float`, `Double`	`number`
`BigInt`	`bigint`
`Bytes`	`Uint8Array`
`Array[T]`	`Array<T>`
函数类型	`Function`

基于这些对应关系，我们已经能够对许多简单的 JavaScript 函数进行绑定了。事实上，在之前绑定 console.log 函数的例子中，我们已经使用了 MoonBit 中 String 类型与 JavaScript 中 string 类型的对应关系。

注意：维持 MoonBit 类型的内部不变量

一个非常重要的细节是，MoonBit 的所有标准数值类型（Int, Float 等）在 JavaScript 中都对应于 number 类型，即 IEEE 754 双精度浮点数。这意味着当整数值越过 FFI 边界进入 JavaScript 后，其行为将遵循浮点数语义，这可能会导致在 MoonBit 看来非预期的结果，例如整数溢出行为的差异：

extern "js" fn incr(x : Int
Int) -> Int
Int = "(x) => x + 1"

test "incr" {
  // 在 MoonBit 中，@int.max_value + 1 会溢出并回绕
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Int
Maximum value of an integer.
@int.max_value (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ 1, String
content="-2147483648")
  // 在 JavaScript 中，它被当作浮点数处理，不会溢出
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect((x : Int) -> Int
incr(Int
Maximum value of an integer.
@int.max_value), String
content="2147483648") // ???
}

而这本质上是不合法的，因为根据 MoonBit 中 Int 的值的内部不变量，其值不可能是 2147483648（超出了类型允许的最大值）。这可能导致下游依赖这一点的其他 MoonBit 代码出现意料之外的行为。在跨越 FFI 边界处理其他数据类型时也有可能出现类似的问题，因此请在编写相关逻辑时务必留意这一点。

外部 JavaScript 类型

当然，JavaScript 的世界远比上述基本类型要丰富。我们很快就会遇到 undefined、null、symbol 以及各种复杂的宿主对象（Host Object）。这些类型在 MoonBit 中没有直接的对应物。

对于这种情况，MoonBit 提供了 #external 注解。这个注解好比一个契约，它告诉编译器： “请相信我，这个类型在外部世界（JavaScript）中是真实存在的。你不需要关心它的内部结构，只需把它当作一个不透明的句柄来处理即可。”

例如，我们可以这样定义一个代表 JavaScript undefined 的类型：

#external
type Undefined

extern "js" fn Undefined::new() -> Self = "() => undefined"

然而，单独的 Undefined 类型意义不大，因为在实际应用中，undefined 往往是作为联合类型（Union Type）的一部分出现的，例如 string | undefined。

一个更实用的方案是创建一个 Optional[T] 类型来精确对应 JavaScript 中的 T | undefined，并让它能与 MoonBit 内置的 T?（Option[T]）类型方便地互相转换。

为了实现这个目标，我们首先需要一个能够代表“任意” JavaScript 值的类型，类似于 TypeScript 中的 any。这正是 #external 的用武之地：

#external
pub type Value

相应地，我们还需要提供获取 undefined 值和判断某值是否为 undefined 的方法：

extern "js" fn type Value
Value::undefined() -> type Value
Value =
  #| () => undefined

extern "js" fn type Value
Value::is_undefined(self : type Value
Self) -> Bool
Bool =
  #| (n) => Object.is(n, undefined)

为了方便调试，我们再为 Value 类型实现 Show 特质，让它可以被打印出来：

pub impl trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show for type Value
Value with (self : Value, logger : &Logger) -> Unit
output(Value
self, &Logger
logger) {
  &Logger
logger.(&Logger, String) -> Unit
write_string(Value
self.(self : Value) -> String
to_string())
}

pub extern "js" fn type Value
Value::to_string(self : type Value
Value) -> String
String =
  #| (self) =>
  #|   self === undefined ? 'undefined'
  #|     : self === null ? 'null'
  #|     : self.toString()

接下来是整个转换过程中的“魔法”所在。我们定义两个特殊的转换函数：

fn[T] type Value
Value::cast_from(value : type parameter T
T) -> type Value
Value = "%identity"

fn[T] type Value
Value::cast(self : type Value
Self) -> type parameter T
T = "%identity"

何为 %identity

%identity 是 MoonBit 提供的一个特殊内建函数（intrinsic），它是一个“零成本”的类型转换操作。它在编译时会进行类型检查，但在运行时不会产生任何效果。它仅仅是告诉编译器：“作为开发者，我比你更清楚这个值的真实类型，请直接将它当作另一种类型来看待。”

这是一把双刃剑：它为 FFI 边界层的代码提供了强大的表达能力，但如果滥用，则可能破坏类型安全。因此，它的使用场景应当被严格限制在 FFI 相关代码范围内。

有了这些积木，我们就可以开始搭建 Optional[T] 了：

#external
type Optional[_] // 对应 T | undefined

/// 创建一个 undefined 的 Optional
fn[T] type Optional[_]
Optional::() -> Optional[T]
创建一个 undefined 的 Optional
undefined() -> type Optional[_]
Optional[type parameter T
T] {
  type Value
Value::() -> Value
undefined().(self : Value) -> Optional[T]
cast()
}

/// 检查一个 Optional 是否为 undefined
fn[T] type Optional[_]
Optional::(self : Optional[T]) -> Bool
检查一个 Optional 是否为 undefined
is_undefined(Optional[T]
self : type Optional[_]
Optional[type parameter T
T]) -> Bool
Bool {
  Optional[T]
self |> type Value
Value(Optional[T]) -> Value
::cast_from |> type Value
Value(Value) -> Bool
::is_undefined
}

/// 从 Optional[T] 中解包出 T，如果为 undefined 则 panic
fn[T] type Optional[_]
Optional::(self : Optional[T]) -> T
从 Optional[T] 中解包出 T，如果为 undefined 则 panic
unwrap(Optional[T]
self : type Optional[_]
Self[type parameter T
T]) -> type parameter T
T {
  guard Bool
!Optional[T]
selfBool
.(self : Optional[T]) -> Bool
检查一个 Optional 是否为 undefined
is_undefinedBool
() else { (msg : String) -> T
Aborts the program with an error message. Always causes a panic, regardless
of the message provided.
Parameters:

message : A string containing the error message to be displayed when
aborting.
Returns a value of type T. However, this function never actually returns a
value as it always causes a panic.
abort("Cannot unwrap an undefined value") }
  type Value
Value::(value : Optional[T]) -> Value
cast_from(Optional[T]
self).(self : Value) -> T
cast()
}

/// 将 Optional[T] 转换为 MoonBit 内置的 T?
fn[T] type Optional[_]
Optional::(self : Optional[T]) -> T?
将 Optional[T] 转换为 MoonBit 内置的 T?
to_option(Optional[T]
self : type Optional[_]
Optional[type parameter T
T]) -> type parameter T
T? {
  guard Bool
!type Value
ValueBool
::(value : Optional[T]) -> Value
cast_fromBool
(Optional[T]
selfBool
).(self : Value) -> Bool
is_undefinedBool
() else { T?
None }
  (T) -> T?
Some(type Value
Value::(value : Optional[T]) -> Value
cast_from(Optional[T]
self).(self : Value) -> T
cast())
}

/// 从 MoonBit 内置的 T? 创建 Optional[T]
fn[T] type Optional[_]
Optional::(value : T?) -> Optional[T]
从 MoonBit 内置的 T? 创建 Optional[T]
from_option(T?
value : type parameter T
T?) -> type Optional[_]
Optional[type parameter T
T] {
  guard T?
value is (T) -> T?
Some(T
v) else { type Optional[_]
Optional::() -> Optional[T]
创建一个 undefined 的 Optional
undefined() }
  type Value
Value::(value : T) -> Value
cast_from(T
v).(self : Value) -> Optional[T]
cast()
}

test "Optional from and to Option" {
  let Optional[Int]
optional = type Optional[_]
Optional::(value : Int?) -> Optional[Int]
从 MoonBit 内置的 T? 创建 Optional[T]
from_option((Int) -> Int?
Some(3))
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Optional[Int]
optional.(self : Optional[Int]) -> Int
从 Optional[T] 中解包出 T，如果为 undefined 则 panic
unwrap(), String
content="3")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Optional[Int]
optional.(self : Optional[Int]) -> Bool
检查一个 Optional 是否为 undefined
is_undefined(), String
content="false")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Optional[Int]
optional.(self : Optional[Int]) -> Int?
将 Optional[T] 转换为 MoonBit 内置的 T?
to_option(), String
content="Some(3)")
  let Optional[Int]
optional : type Optional[_]
Optional[Int
Int] = type Optional[_]
Optional::(value : Int?) -> Optional[Int]
从 MoonBit 内置的 T? 创建 Optional[T]
from_option(Int?
None)
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Optional[Int]
optional.(self : Optional[Int]) -> Bool
检查一个 Optional 是否为 undefined
is_undefined(), String
content="true")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Optional[Int]
optional.(self : Optional[Int]) -> Int?
将 Optional[T] 转换为 MoonBit 内置的 T?
to_option(), String
content="None")
}

通过这套组合拳，我们成功地在 MoonBit 的类型系统中为 T | undefined 找到了一个安全且人体工学良好的表达方式。同样的方法也可以用于对接 null、symbol、RegExp 等其他 JavaScript 特有的类型。

处理 JavaScript 错误

一个健壮的 FFI 层必须能够优雅地处理错误。默认情况下，如果在 FFI 调用中，JavaScript 代码抛出了一个异常，这个异常并不会被 MoonBit 的 try-catch 机制捕获，而是会直接中断整个程序的执行：

// 这是一个会抛出异常的 FFI 调用
extern "js" fn boom_naive() -> Value raise = "(u) => undefined.toString()"

test "boom_naive" {
  // 这段代码会直接让测试进程崩溃，而不是通过 `try?` 返回一个 `Result`
  inspect(try? boom_naive()) // failed: TypeError: Cannot read properties of undefined (reading 'toString')
}

正确的做法是在 JavaScript 层用 try...catch 语句将调用包裹起来，然后找到一种办法将成功的结果或捕获到的错误传递回 MoonBit。当然，我们可以直接在 extern "js" 声明的 JavaScript 代码中这么做，但也存在更可复用的解决办法：

首先，我们定义一个 Error_ 类型来封装来自 JavaScript 的错误：

suberror Error_ type Value
Value

pub impl trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show for suberror Error_ Value
Error_ with (self : Error_, logger : &Logger) -> Unit
output(Error_
self, &Logger
logger) {
  &Logger
logger.(&Logger, String) -> Unit
write_string("@js.Error: ")
  let (Value) -> Error_
Error_(Value
inner) = Error_
self
  &Logger
logger.(self : &Logger, obj : Value) -> Unit
write_object(Value
inner)
}

接着，我们定义一个核心的 FFI 包装函数 Error_::wrap_ffi。它的作用是在 JavaScript 领域执行一个操作（op），并根据成功与否，调用不同的回调函数（on_ok 或 on_error）：

extern "js" fn suberror Error_ Value
Error_::wrap_ffi(
  op : () -> type Value
Value,
  on_ok : (type Value
Value) -> Unit
Unit,
  on_error : (type Value
Value) -> Unit
Unit,
) -> Unit
Unit =
  #| (op, on_ok, on_error) => { try { on_ok(op()); } catch (e) { on_error(e); } }

最后，我们利用这个 FFI 函数和 MoonBit 的闭包，就可以封装出一个符合 MoonBit 风格、返回 T raise Error_ 的 Error_::wrap 函数：

fn[T] suberror Error_ Value
Error_::(op : () -> Value, map_ok? : (Value) -> T) -> T raise Error_
wrap(
  () -> Value
op : () -> type Value
Value,
  (Value) -> T
map_ok~ : (type Value
Value) -> type parameter T
T = type Value
Value(Value) -> T
::cast,
) -> type parameter T
T raise suberror Error_ Value
Error_ {
  // 定义一个变量，用于在闭包内外传递结果
  let mut Result[Value, Error_]
res : enum Result[A, B] {
  Err(B)
  Ok(A)
}
Result[type Value
Value, suberror Error_ Value
Error_] = (Value) -> Result[Value, Error_]
Ok(type Value
Value::() -> Value
undefined())
  // 调用 FFI，传入两个闭包，它们会根据 JS 的执行结果修改 res 的值
  suberror Error_ Value
Error_::(op : () -> Value, on_ok : (Value) -> Unit, on_error : (Value) -> Unit) -> Unit
wrap_ffi(() -> Value
op, fn(Value
v) { Result[Value, Error_]
res = (Value) -> Result[Value, Error_]
Ok(Value
v) }, fn(Value
e) { Result[Value, Error_]
res = (Error_) -> Result[Value, Error_]
Err((Value) -> Error_
Error_(Value
e)) })
  // 检查 res 的值，并返回相应的结果或抛出错误
  match Result[Value, Error_]
res {
    (Value) -> Result[Value, Error_]
Ok(Value
v) => (Value) -> T
map_ok(Value
v)
    (Error_) -> Result[Value, Error_]
Err(Error_
e) => raise Error_
e
  }
}

现在，我们可以安全地调用之前那个会抛出异常的函数了，并且能以纯 MoonBit 代码来处理可能发生的错误：

extern "js" fn boom() -> type Value
Value = "(u) => undefined.toString()"

test "boom" {
  let Result[Value, Error_]
result = try? suberror Error_ Value
Error_::(op : () -> Value, map_ok? : (Value) -> Value) -> Value raise Error_
wrap(() -> Value
boom)
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    (Result[Value, Error_]
result : enum Result[A, B] {
  Err(B)
  Ok(A)
}
Result[type Value
Value, suberror Error_ Value
Error_]),
    String
content="Err(@js.Error: TypeError: Cannot read properties of undefined (reading 'toString'))",
  )
}

对接外部 JavaScript API

至此，我们已经掌握了处理类型和错误的关键技术，是时候将目光投向更广阔的天地了——整个 Node.js 和 NPM 生态系统。而这一切的入口，就是对 require() 函数的绑定。

extern "js" fn require_ffi(path : String
String) -> type Value
Value = "(path) => require(path)"

/// 一个更方便的包装，支持链式获取属性，例如 require("a", keys=["b", "c"])
pub fn (path : String, keys? : Array[String]) -> Value
一个更方便的包装，支持链式获取属性，例如 require("a", keys=["b", "c"])
require(String
path : String
String, Array[String]
keys~ : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[String
String] = []) -> type Value
Value {
  Array[String]
keys.(self : Array[String], init~ : Value, f : (Value, String) -> Value) -> Value
Fold out values from an array according to certain rules.
Example:
  let sum = [1, 2, 3, 4, 5].fold(init=0, (sum, elem) => sum + elem)
  assert_eq(sum, 15)
fold(Value
init=(path : String) -> Value
require_ffi(String
path), type Value
Value(Value, String) -> Value
::get_with_string)
}

// ... 其中 Value::get_with_string 的定义如下：

fn[T] type Value
Value::(self : Value, key : String) -> T
get_with_string(Value
self : type Value
Self, String
key : String
String) -> type parameter T
T {
  Value
self.(self : Value, key : Value) -> Value
get_ffi(type Value
Value::(value : String) -> Value
cast_from(String
key)).(self : Value) -> T
cast()
}

extern "js" fn type Value
Value::get_ffi(self : type Value
Self, key : type Value
Self) -> type Value
Self = "(obj, key) => obj[key]"

有了这个 require 函数，我们就可以轻松加载 Node.js 的内置模块，例如 node:path 模块，并调用它的方法：

// 加载 node:path 模块的 basename 函数
let (String) -> String
basename : (String
String) -> String
String = (path : String, keys~ : Array[String]) -> Value
一个更方便的包装，支持链式获取属性，例如 require("a", keys=["b", "c"])
require("node:path", Array[String]
keys=["basename"]).(self : Value) -> (String) -> String
cast()

test "require Node API" {
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect((String) -> String
basename("/foo/bar/baz/asdf/quux.html"), String
content="quux.html")
}

更令人兴奋的是，使用同样的方法，我们还能调用 NPM 上的海量第三方库。让我们以一个流行的统计学计算库 simple-statistics 为例。

首先，我们需要像在一个标准的 JavaScript 项目中那样，初始化 package.json 并安装依赖。这里我们使用 pnpm，你也可以换成 npm 或 yarn：

> pnpm init
> pnpm install simple-statistics

准备工作就绪后，我们就可以在 MoonBit 代码中直接 require 这个库，并获取其中的 standardDeviation 函数：

let (Array[Double]) -> Double
standard_deviation : (type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[Double
Double]) -> Double
Double = (path : String, keys~ : Array[String]) -> Value
一个更方便的包装，支持链式获取属性，例如 require("a", keys=["b", "c"])
require(
  "simple-statistics",
  Array[String]
keys=["standardDeviation"],
).(self : Value) -> (Array[Double]) -> Double
cast()

现在，无论是 moon run 还是 moon test，MoonBit 都能正确地通过 Node.js 加载依赖并执行代码，返回我们期望的计算结果。

test "require external lib" {
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect((Array[Double]) -> Double
standard_deviation([2, 4, 4, 4, 5, 5, 7, 9]), String
content="2")
}

这无疑是激动人心的。仅仅通过几行 FFI 代码，我们就将 MoonBit 的类型安全世界与 NPM 庞大、成熟的生态系统连接在了一起。

结语

通过本文的探索，我们初步了解了如何在 MoonBit 语言中与 JavaScript 进行交互，从最基础的类型对接到复杂的错误处理，再到外部库的轻松集成。这些功能在 MoonBit 的静态类型系统与作为动态类型语言的 JavaScript 之间架起了一座桥梁，这体现了 MoonBit 作为现代编程语言在跨语言互操作性方面的思考。它让开发者既能享受到 MoonBit 的类型安全与现代化的语言特性，又能无缝访问 JavaScript 的庞大生态，为 MoonBit 拓宽了不可估量的应用前景。

当然，能力越大，责任也越大：FFI 虽然强大，但在实际开发中仍需谨慎处理类型转换和错误边界，确保程序的健壮性。

对于希望利用 JavaScript 库来扩展 MoonBit 应用功能的开发者来说，掌握这些 FFI 技术将是一项至关重要的技能。通过合理运用这些技术，我们可以构建出既具有 MoonBit 语言优势，又能充分利用 JavaScript 生态资源的高质量应用程序。

如果希望了解关于 MoonBit 在 JavaScript 互操作方面的探索进展的更多内容，欢迎关注基于 MoonBit 构建的 Web 应用前端 mooncakes.io 及其背后的界面库 rabbit-tea。

正则表达式引擎的两种实现方法：导数与 Thompson 虚拟机

2025年9月10日 · 阅读需 12 分钟

正则表达式引擎的实现方式多样，不同方法在性能、内存消耗和实现复杂度上各有权衡。本文将介绍两种数学上等价但实际表现迥异的正则匹配方法：Brzozowski 导数方法和 Thompson 虚拟机方法。

这两种方法都基于相同的抽象语法树表示，为直接的性能对比提供了统一的基础。其核心思想在于：这些看似不同的方法实际上是用不同的计算策略来解决同一个问题——一个依靠代数变换，另一个则通过程序执行。

约定与定义

为了建立统一的基础，两种正则表达式引擎都采用相同的抽象语法树（AST）表示，用树形结构来描述正则表达式的基本构造：

enum Ast {
  (Char) -> Ast
Chr(Char
Char)
  (Ast, Ast) -> Ast
Seq(enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast, enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast)
  (Ast, Int?) -> Ast
Rep(enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast, Int
Int?)
  (Ast) -> Ast
Opt(enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast)
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson, trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}
Trait for types that can be hashed
The hash method should return a hash value for the type, which is used in hash tables and other data structures.
The hash_combine method is used to combine the hash of the current value with another hash value,
typically used to hash composite types.
When two values are equal according to the Eq trait, they should produce the same hash value.
The hash method does not need to be implemented if hash_combine is implemented,
When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.
Hash, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq)

此外，我们还提供了智能构造函数来简化正则表达式的构建：

fn enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast::(chr : Char) -> Ast
chr(Char
chr : Char
Char) -> enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast {
  (Char) -> Ast
Chr(Char
chr)
}

fn enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast::(self : Ast, other : Ast) -> Ast
seq(Ast
self : enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast, Ast
other : enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast) -> enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast {
  (Ast, Ast) -> Ast
Seq(Ast
self, Ast
other)
}

fn enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast::(self : Ast, n? : Int) -> Ast
rep(Ast
self : enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast, Int?
n? : Int
Int) -> enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast {
  (Ast, Int?) -> Ast
Rep(Ast
self, Int?
n)
}

fn enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast::(self : Ast) -> Ast
opt(Ast
self : enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast) -> enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast {
  Unit
@fs.
  (Ast) -> Ast
Opt(Ast
self)
}

AST 定义了四种基本的正则表达式操作：

Chr(Char) - 匹配单个字符字面量
Seq(Ast, Ast) - 序列匹配，即一个模式紧跟另一个模式
Rep(Ast, Int?) - 重复匹配，None 表示无限次重复，Some(n) 表示恰好重复 n 次
Opt(Ast) - 可选匹配，相当于标准正则语法中的 pattern?

举个例子，正则表达式 (ab*)? 表示一个可选的序列（'a' 后跟零个或多个 'b'），可以这样构建：

Ast::chr('a').seq(Ast::chr('b').rep()).opt()

Brzozowski 导数方法

导数方法基于形式语言理论，通过代数变换来处理正则表达式。对于输入的每个字符，该方法计算正则表达式的"导数"，实质上是在问："消费掉这个字符后，还剩下什么需要匹配？"这样就得到了一个新的正则表达式，代表剩余的匹配模式。

为了明确表示导数和可空性，我们对基本的 Ast 类型进行了扩展：

enum Exp {
  Exp
Nil
  Exp
Eps
  (Char) -> Exp
Chr(Char
Char)
  (Exp, Exp) -> Exp
Alt(enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp, enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp)
  (Exp, Exp) -> Exp
Seq(enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp, enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp)
  (Exp) -> Exp
Rep(enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp)
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}
Trait for types that can be hashed
The hash method should return a hash value for the type, which is used in hash tables and other data structures.
The hash_combine method is used to combine the hash of the current value with another hash value,
typically used to hash composite types.
When two values are equal according to the Eq trait, they should produce the same hash value.
The hash method does not need to be implemented if hash_combine is implemented,
When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.
Hash, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait Compare {
  compare(Self, Self) -> Int
}
Trait for types whose elements are ordered
The return value of [compare] is:

zero, if the two arguments are equal
negative, if the first argument is smaller
positive, if the first argument is greater
Compare, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

Exp 中各构造器的含义如下：

Nil - 表示不可能匹配的模式，即空集
Eps - 匹配空字符串
Chr(Char) - 匹配单个字符
Alt(Exp, Exp) - 表示选择（或），在多个模式间进行选择
Seq(Exp, Exp) - 表示连接，将两个模式依次连接
Rep(Exp) - 表示重复，对模式进行零次或多次重复

通过 Exp::of_ast 函数，我们可以将 Ast 转换为表达能力更强的 Exp 格式：

fn enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(ast : Ast) -> Exp
of_ast(Ast
ast : enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast) -> enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp {
  match Ast
ast {
    (Char) -> Ast
Chr(Char
c) => (Char) -> Exp
Chr(Char
c)
    (Ast, Ast) -> Ast
Seq(Ast
a, Ast
b) => (Exp, Exp) -> Exp
Seq(enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(ast : Ast) -> Exp
of_ast(Ast
a), enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(ast : Ast) -> Exp
of_ast(Ast
b))
    (Ast, Int?) -> Ast
Rep(Ast
a, Int?
None) => (Exp) -> Exp
Rep(enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(ast : Ast) -> Exp
of_ast(Ast
a))
    (Ast, Int?) -> Ast
Rep(Ast
a, (Int) -> Int?
Some(Int
n)) => {
      let Exp
sec = enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(ast : Ast) -> Exp
of_ast(Ast
a)
      let mut Exp
exp = Exp
sec
      for _ in Int
1..<Int
n {
        Exp
exp = (Exp, Exp) -> Exp
Seq(Exp
exp, Exp
sec)
      }
      Exp
exp
    }
    (Ast) -> Ast
Opt(Ast
a) => (Exp, Exp) -> Exp
Alt(enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(ast : Ast) -> Exp
of_ast(Ast
a), Exp
Eps)
  }
}

同样，我们也为 Exp 提供了智能构造函数来简化模式构建：

fn enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(a : Exp, b : Exp) -> Exp
seq(Exp
a : enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp, Exp
b : enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp) -> enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp {
  match (Exp
a, Exp
b) {
    (Exp
Nil, _) | (_, Exp
Nil) => Exp
Nil
    (Exp
Eps, Exp
b) => Exp
b
    (Exp
a, Exp
Eps) => Exp
a
    (Exp
a, Exp
b) => (Exp, Exp) -> Exp
Seq(Exp
a, Exp
b)
  }
}

不过，Alt 的智能构造函数特别重要——它保证构造出的 Exp 符合 Brzozowski 原论文中的"相似性"标准化要求。两个正则表达式如果能通过以下规则相互转换，就被认为是相似的：

\begin{align} & A \mid \emptyset &&\rightarrow A \\ & A \mid B &&\rightarrow B \mid A \\ & A \mid (B \mid C) &&\rightarrow (A \mid B) \mid C \end{align}

因此，我们对 Alt 构造进行标准化，确保始终使用一致的结合律和选择顺序：

fn enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(a : Exp, b : Exp) -> Exp
alt(Exp
a : enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp, Exp
b : enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp) -> enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp {
  match (Exp
a, Exp
b) {
    (Exp
Nil, Exp
b) => Exp
b
    (Exp
a, Exp
Nil) => Exp
a
    ((Exp, Exp) -> Exp
Alt(Exp
a, Exp
b), Exp
c) => Exp
a.(a : Exp, b : Exp) -> Exp
alt(Exp
b.(a : Exp, b : Exp) -> Exp
alt(Exp
c))
    (Exp
a, Exp
b) => {
      if Exp
a (Exp, Exp) -> Bool
automatically derived
== Exp
b {
        Exp
a
      } else if Exp
a (self_ : Exp, other : Exp) -> Bool
> Exp
b {
        (Exp, Exp) -> Exp
Alt(Exp
b, Exp
a)
      } else {
        (Exp, Exp) -> Exp
Alt(Exp
a, Exp
b)
      }
    }
  }
}

nullable 函数用于判断一个模式是否能够在不消费任何输入的情况下成功匹配（即匹配空字符串）：

fn enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(self : Exp) -> Bool
nullable(Exp
self : enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp) -> Bool
Bool {
  match Exp
self {
    Exp
Nil => false
    Exp
Eps => true
    (Char) -> Exp
Chr(_) => false
    (Exp, Exp) -> Exp
Alt(Exp
l, Exp
r) => Exp
l.(self : Exp) -> Bool
nullable() (Bool, Bool) -> Bool
|| Exp
r.(self : Exp) -> Bool
nullable()
    (Exp, Exp) -> Exp
Seq(Exp
l, Exp
r) => Exp
l.(self : Exp) -> Bool
nullable() (Bool, Bool) -> Bool
&& Exp
r.(self : Exp) -> Bool
nullable()
    (Exp) -> Exp
Rep(_) => true
  }
}

deriv 函数计算模式对于特定字符的导数，按照 Brzozowski 导数理论中定义的规则对模式进行变换。我们对规则进行了重新排列，使其与 deriv 函数的实现顺序保持一致：

\begin{align} D_{a} \emptyset &= \emptyset \\ D_{a} \epsilon &= \emptyset \\ D_{a} a &= \epsilon \\ D_{a} b &= \emptyset & \text{ for }(a \neq b) \\ D_{a} (P \mid Q) &= (D_{a} P) \mid (D_{a} Q) \\ D_{a} (P \cdot Q) &= (D_{a} P \cdot Q) \mid (\nu(P) \cdot D_{a} Q) \\ D_{a} (P\ast) &= D_{a} P \cdot P\ast \\ \end{align}

fn enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(self : Exp, c : Char) -> Exp
deriv(Exp
self : enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp, Char
c : Char
Char) -> enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp {
  match Exp
self {
    Exp
Nil => Exp
self
    Exp
Eps => Exp
Nil
    (Char) -> Exp
Chr(Char
d) if Char
d (self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:

self : The first character to compare.
other : The second character to compare.
Returns true if both characters represent the same Unicode code point,
false otherwise.
Example:
  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
== Char
c => Exp
Eps
    (Char) -> Exp
Chr(_) => Exp
Nil
    (Exp, Exp) -> Exp
Alt(Exp
l, Exp
r) => Exp
l.(self : Exp, c : Char) -> Exp
deriv(Char
c).(a : Exp, b : Exp) -> Exp
alt(Exp
r.(self : Exp, c : Char) -> Exp
deriv(Char
c))
    (Exp, Exp) -> Exp
Seq(Exp
l, Exp
r) => {
      let Exp
dl = Exp
l.(self : Exp, c : Char) -> Exp
deriv(Char
c)
      if Exp
l.(self : Exp) -> Bool
nullable() {
        Exp
dl.(a : Exp, b : Exp) -> Exp
seq(Exp
r).(a : Exp, b : Exp) -> Exp
alt(Exp
r.(self : Exp, c : Char) -> Exp
deriv(Char
c))
      } else {
        Exp
dl.(a : Exp, b : Exp) -> Exp
seq(Exp
r)
      }
    }
    (Exp) -> Exp
Rep(Exp
e) => Exp
e.(self : Exp, c : Char) -> Exp
deriv(Char
c).(a : Exp, b : Exp) -> Exp
seq(Exp
self)
  }
}

为了简化实现，我们这里只进行严格匹配，也就是说模式必须匹配整个输入字符串。因此，只有在处理完所有输入字符后，我们才检查最终模式的可空性：

fn enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(self : Exp, s : String) -> Bool
matches(Exp
self : enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp, String
s : String
String) -> Bool
Bool {
  loop (Exp
self, String
s.(self : String, start_offset? : Int, end_offset? : Int) -> StringView
Creates a View into a String.
Example
  let str = "Hello🤣🤣🤣"
  let view1 = str.view()
  inspect(view1, content=
   "Hello🤣🤣🤣"
  )
  let start_offset = str.offset_of_nth_char(1).unwrap()
  let end_offset = str.offset_of_nth_char(6).unwrap() // the second emoji
  let view2 = str.view(start_offset~, end_offset~)
  inspect(view2, content=
   "ello🤣"
  )
view()) {
    (Exp
Nil, _) => {
      return false
    }
    (Exp
e, []) => {
      return Exp
e.(self : Exp) -> Bool
nullable()
    }
    (Exp
e, StringView
[Char
cStringView
, .. s]) => {
      continue (Exp
e.(self : Exp, c : Char) -> Exp
deriv(Char
c), StringView
s)
    }
  }
}

虚拟机方法

虚拟机方法将正则表达式编译成简单虚拟机的字节码指令。这种方法把模式匹配问题转化为程序执行过程，虚拟机同时模拟非确定性有限自动机中所有可能的执行路径。

Ken Thompson 在 1968 年的经典论文中描述了一种将正则模式编译为 IBM 7094 机器代码的引擎。其关键思路是：通过维护多个执行线程来避免指数级回溯，这些线程同步地在输入中前进，每次处理一个字符，同时探索所有可能的匹配路径。

指令集与程序表示

该虚拟机基于四种基本指令运行，它们分别对应 NFA 的不同操作：

enum Ops {
  Ops
Done
  (Char) -> Ops
Char(Char
Char)
  (Int) -> Ops
Jump(Int
Int)
  (Int) -> Ops
Fork(Int
Int)
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

每条指令在 NFA 模拟中都有其特定作用：Done 标记匹配成功完成，对应 Thompson 原设计中的 match；Char(c) 消费输入字符 c 并跳转到下一条指令；Jump(addr) 无条件跳转至地址 addr，即 Thompson 的 jmp；Fork(addr) 创建两条执行路径——一条继续执行下一条指令，另一条跳转到 addr，对应 Thompson 的 split。

Fork 指令是处理模式非确定性的关键，比如选择和重复操作，这些情况下需要同时探索多条执行路径。这直接对应了 NFA 中的 ε-转换，即执行流可以在不消费输入的情况下发生分支。

我们定义了 Prg 类型，它封装了指令数组并提供便捷的方法来构建和操作字节码程序：

type Prg type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show, ToJson)
Ops] derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

fn type Prg Array[Ops] derive(Show, ToJson)
Prg::(self : Prg, inst : Ops) -> Unit
push(Prg
self : type Prg Array[Ops] derive(Show, ToJson)
Prg, Ops
inst : enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show, ToJson)
Ops) -> Unit
Unit {
  Prg
self.(self : Prg) -> Array[Ops]
Convert newtype to its underlying type, automatically derived.
inner().(self : Array[Ops], value : Ops) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Ops
inst)
}

fn type Prg Array[Ops] derive(Show, ToJson)
Prg::(self : Prg) -> Int
length(Prg
self : type Prg Array[Ops] derive(Show, ToJson)
Prg) -> Int
Int {
  Prg
self.(self : Prg) -> Array[Ops]
Convert newtype to its underlying type, automatically derived.
inner().(self : Array[Ops]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length()
}

fn type Prg Array[Ops] derive(Show, ToJson)
Prg::(self : Prg, index : Int, inst : Ops) -> Unit
op_set(Prg
self : type Prg Array[Ops] derive(Show, ToJson)
Prg, Int
index : Int
Int, Ops
inst : enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show, ToJson)
Ops) -> Unit
Unit {
  Prg
selfArray[Ops]
Sets the element at the specified index in the array to a new value. The
original value at that index is overwritten.
Parameters:

array : The array to modify.
index : The position in the array where the value will be set.
value : The new value to assign at the specified index.
Throws an error if index is negative or greater than or equal to the length
of the array.
Example:
  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
.(self : Prg) -> Array[Ops]
Convert newtype to its underlying type, automatically derived.
innerArray[Ops]
Sets the element at the specified index in the array to a new value. The
original value at that index is overwritten.
Parameters:

array : The array to modify.
index : The position in the array where the value will be set.
value : The new value to assign at the specified index.
Throws an error if index is negative or greater than or equal to the length
of the array.
Example:
  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
()[Int
index] = Ops
inst
}

AST 到字节码的编译

Prg::of_ast 函数采用标准的 NFA 构造技术，将 AST 模式转换为虚拟机指令：

Seq(a, b)：
```
code for a
code for b
```

Rep(a, None) (无界重复)：

    Fork L1, L2
L1: code for a
    Jump L1
L2:

Rep(a, Some(n)) (固定重复)：

code for a
code for a
... (n times) ...

Opt(a) (可选)：
```
    Fork L1, L2
L1: code for a
L2:
```

需要注意的是，Fork 构造器只接受一个地址参数，这是因为我们总是希望在 Fork 指令后继续执行下一条指令。

fn type Prg Array[Ops] derive(Show, ToJson)
Prg::(ast : Ast) -> Prg
of_ast(Ast
ast : enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast) -> type Prg Array[Ops] derive(Show, ToJson)
Prg {
  fn (Prg, Ast) -> Unit
compile(Prg
prog : type Prg Array[Ops] derive(Show, ToJson)
Prg, Ast
ast : enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast) -> Unit
Unit {
    match Ast
ast {
      (Char) -> Ast
Chr(Char
chr) => Prg
prog.(self : Prg, inst : Ops) -> Unit
push((Char) -> Ops
Char(Char
chr))
      (Ast, Ast) -> Ast
Seq(Ast
l, Ast
r) => {
        (Prg, Ast) -> Unit
compile(Prg
prog, Ast
l)
        (Prg, Ast) -> Unit
compile(Prg
prog, Ast
r)
      }
      (Ast, Int?) -> Ast
Rep(Ast
e, Int?
None) => {
        let Int
fork = Prg
prog.(self : Prg) -> Int
length()
        Prg
prog.(self : Prg, inst : Ops) -> Unit
push((Int) -> Ops
Fork(0))
        (Prg, Ast) -> Unit
compile(Prg
prog, Ast
e)
        Prg
prog.(self : Prg, inst : Ops) -> Unit
push((Int) -> Ops
Jump(Int
fork))
        Prg
prog(Prg, Int, Ops) -> Unit
[fork] = (Int) -> Ops
Fork(Prg
prog.(self : Prg) -> Int
length())
      }
      (Ast, Int?) -> Ast
Rep(Ast
e, (Int) -> Int?
Some(Int
n)) =>
        for _ in Int
0..<Int
n {
          (Prg, Ast) -> Unit
compile(Prg
prog, Ast
e)
        }
      (Ast) -> Ast
Opt(Ast
e) => {
        let Int
fork_inst = Prg
prog.(self : Prg) -> Int
length()
        Prg
prog.(self : Prg, inst : Ops) -> Unit
push((Int) -> Ops
Fork(0))
        (Prg, Ast) -> Unit
compile(Prg
prog, Ast
e)
        Prg
prog(Prg, Int, Ops) -> Unit
[fork_inst] = (Int) -> Ops
Fork(Prg
prog.(self : Prg) -> Int
length())
      }
    }
  }

  let Prg
prog : type Prg Array[Ops] derive(Show, ToJson)
Prg = []
  (Prg, Ast) -> Unit
compile(Prg
prog, Ast
ast)
  Prg
prog.(self : Prg, inst : Ops) -> Unit
push(Ops
Done)
  Prg
prog
}

虚拟机执行循环

在 Rob Pike 的实现中，虚拟机会在输入字符串结束后再执行一轮来处理最终的接受状态。为了明确这个过程，我们的 matches 函数采用两阶段方法来实现核心的虚拟机执行循环：

阶段一：字符处理。对于每个输入字符，处理当前上下文中所有活跃的线程。如果 Char 指令匹配当前字符，就在下一个上下文中创建新线程。Jump 和 Fork 指令会立即在当前上下文中产生新线程。处理完所有线程后，交换上下文并继续处理下一个字符。

阶段二：最终接受判断。处理完所有输入后，检查剩余线程中是否有 Done 指令。同时处理那些不消费输入的 Jump/Fork 指令。如果有任何线程到达 Done 指令，就返回 true。

fn type Prg Array[Ops] derive(Show, ToJson)
Prg::(self : Prg, data : StringView) -> Bool
matches(Prg
self : type Prg Array[Ops] derive(Show, ToJson)
Prg, StringView
data : type StringView
StringView represents a view of a String that maintains proper Unicode
character boundaries. It allows safe access to a substring while handling
multi-byte characters correctly.
@string.View) -> Bool
Bool {
  let (Array[Ops]) -> Prg
Prg(Array[Ops]
prog) = Prg
self
  let mut Ctx
curr = struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx::(length : Int) -> Ctx
new(Array[Ops]
prog.(self : Array[Ops]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length())
  let mut Ctx
next = struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx::(length : Int) -> Ctx
new(Array[Ops]
prog.(self : Array[Ops]) -> Int
Returns the number of elements in the array.
Parameters:

array : The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length())
  Ctx
curr.(self : Ctx, pc : Int) -> Unit
add(0)
  for Char
c in StringView
data {
    while Ctx
curr.(self : Ctx) -> Int?
pop() is (Int) -> Int?
Some(Int
pc) {
      match Array[Ops]
prog[Int
pc] {
        Ops
Done => ()
        (Char) -> Ops
Char(Char
char) if Char
char (self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:

self : The first character to compare.
other : The second character to compare.
Returns true if both characters represent the same Unicode code point,
false otherwise.
Example:
  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
== Char
c => {
          Ctx
next.(self : Ctx, pc : Int) -> Unit
add(Int
pc (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ 1)
        }
        (Int) -> Ops
Jump(Int
jump) =>
          Ctx
curr.(self : Ctx, pc : Int) -> Unit
add(Int
jump)
        (Int) -> Ops
Fork(Int
fork) => {
          Ctx
curr.(self : Ctx, pc : Int) -> Unit
add(Int
fork)
          Ctx
curr.(self : Ctx, pc : Int) -> Unit
add(Int
pc (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ 1)
        }
        _ => ()
      }
    }
    let Ctx
temp = Ctx
curr
    Ctx
curr = Ctx
next
    Ctx
next = Ctx
temp
    Ctx
next.(self : Ctx) -> Unit
reset()
  }
  while Ctx
curr.(self : Ctx) -> Int?
pop() is (Int) -> Int?
Some(Int
pc) {
    match Array[Ops]
prog[Int
pc] {
      Ops
Done => return true
      (Int) -> Ops
Jump(Int
x) => Ctx
curr.(self : Ctx, pc : Int) -> Unit
add(Int
x)
      (Int) -> Ops
Fork(Int
x) => {
        Ctx
curr.(self : Ctx, pc : Int) -> Unit
add(Int
x)
        Ctx
curr.(self : Ctx, pc : Int) -> Unit
add(Int
pc (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ 1)
      }
      _ => ()
    }
  }
  false
}

在 Rob Pike 的原始博客中，他使用递归函数来处理 Fork 和 Jump 指令，以保证线程按优先级执行。而我们这里采用了类似栈的结构来管理所有执行线程，这样可以自然地维护线程优先级：

struct Ctx {
  @deque.Deque[Int]
deque : type @deque.Deque[A]
@deque.T[Int
Int]
  FixedArray[Bool]
visit : type FixedArray[A]
FixedArray[Bool
Bool]
}

fn struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx::(length : Int) -> Ctx
new(Int
length : Int
Int) -> struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx {
  { @deque.Deque[Int]
deque: (capacity? : Int) -> @deque.Deque[Int]
Creates a new empty deque with an optional initial capacity.
Parameters:

capacity : The initial capacity of the deque. If not specified, defaults
to 0 and will be automatically adjusted as elements are added.
Returns a new empty deque of type T[A] where A is the type of elements
the deque will hold.
Example
  let dq : @deque.Deque[Int] = @deque.new()
  inspect(dq.length(), content="0")
  inspect(dq.capacity(), content="0")

  let dq : @deque.Deque[Int] = @deque.new(capacity=10)
  inspect(dq.length(), content="0")
  inspect(dq.capacity(), content="10")
@deque.new(), FixedArray[Bool]
visit: type FixedArray[A]
FixedArray::(len : Int, init : Bool) -> FixedArray[Bool]
Creates a new fixed-size array with the specified length, initializing all
elements with the given value.
Parameters:

length : The length of the array to create. Must be non-negative.
initial_value : The value used to initialize all elements in the array.
Returns a new fixed-size array of type FixedArray[T] with length
elements, where each element is initialized to initial_value.
Throws a panic if length is negative.
Example:
  let arr = FixedArray::make(3, 42)
  inspect(arr[0], content="42")
  inspect(arr.length(), content="3")
WARNING: A common pitfall is creating with the same initial value, for example:
  let two_dimension_array = FixedArray::make(10, FixedArray::make(10, 0))
  two_dimension_array[0][5] = 10
  assert_eq(two_dimension_array[5][5], 10)
This is because all the cells reference to the same object (the FixedArray[Int] in this case).
One should use makei() instead which creates an object for each index.
make(Int
length, false) }
}

fn struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx::(self : Ctx, pc : Int) -> Unit
add(Ctx
self : struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx, Int
pc : Int
Int) -> Unit
Unit {
  if Bool
!Ctx
selfBool
.FixedArray[Bool]
visitBool
[Int
pcBool
] {
    Ctx
self.@deque.Deque[Int]
deque.(self : @deque.Deque[Int], value : Int) -> Unit
Adds an element to the back of the deque.
If the deque is at capacity, it will be reallocated.
Example
  let dv = @deque.of([1, 2, 3, 4, 5])
  dv.push_back(6)
  assert_eq(dv.back(), Some(6))
push_back(Int
pc)
    Ctx
self.FixedArray[Bool]
visit[Int
pc] = true
  }
}

fn struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx::(self : Ctx) -> Int?
pop(Ctx
self : struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx) -> Int
Int? {
  match Ctx
self.@deque.Deque[Int]
deque.(self : @deque.Deque[Int]) -> Int?
Removes a back element from a deque and returns it, or None if it is empty.
Example
  let dv = @deque.of([1, 2, 3, 4, 5])
  assert_eq(dv.pop_back(), Some(5))
pop_back() {
    (Int) -> Int?
Some(Int
pc) => {
      Ctx
self.FixedArray[Bool]
visit[Int
pc] = false
      (Int) -> Int?
Some(Int
pc)
    }
    Int?
None => Int?
None
  }
}

fn struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx::(self : Ctx) -> Unit
reset(Ctx
self : struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx) -> Unit
Unit {
  Ctx
self.@deque.Deque[Int]
deque.(self : @deque.Deque[Int]) -> Unit
Clears the deque, removing all values.
This method has no effect on the allocated capacity of the deque, only setting the length to 0.
Example
  let dv = @deque.of([1, 2, 3, 4, 5])
  dv.clear()
  inspect(dv.length(), content="0")
clear()
  Ctx
self.FixedArray[Bool]
visit.(self : FixedArray[Bool], value : Bool, start? : Int, end? : Int) -> Unit
Fill the array with a given value.
This method fills all or part of a FixedArray with the given value.
Parameters

value: The value to fill the array with
start: The starting index (inclusive, default: 0)
end: The ending index (exclusive, optional)
If end is not provided, fills from start to the end of the array.
If start equals end, no elements are modified.
Panics

Panics if start is negative or greater than or equal to the array length
Panics if end is provided and is less than start or greater than array length
Does nothing if the array is empty
Example
// Fill entire array
let fa : FixedArray[Int] = [0, 0, 0, 0, 0]
fa.fill(3)
inspect(fa, content="[3, 3, 3, 3, 3]")

// Fill from index 1 to 3 (exclusive)
let fa2 : FixedArray[Int] = [0, 0, 0, 0, 0]
fa2.fill(9, start=1, end=3)
inspect(fa2, content="[0, 9, 9, 0, 0]")

// Fill from index 2 to end
let fa3 : FixedArray[String] = ["a", "b", "c", "d"]
fa3.fill("x", start=2)
inspect(fa3, content=(
  #|["a", "b", "x", "x"]
))
fill(false)
}

visit 数组用于过滤掉低优先级的重复线程。添加新线程时，我们先通过 visit 数组检查该线程是否已存在于 deque 中。如果已存在就直接丢弃；否则加入 deque 并标记为已访问。这个机制对于处理像 (a?)* 这样可能无限扩展的模式很重要，能够有效避免无限循环或指数级的线程爆炸。

基准测试与性能分析

我们通过一个对很多正则表达式实现都构成挑战的病理性案例来比较这两种方法：

test (@bench.Bench
b : type @bench.Bench
@bench.T) {
  let Int
n = 15
  let String
txt = "a".(self : String, n : Int) -> String
Returns a new string with self repeated n times.
repeat(Int
n)
  let Ast
chr = enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast::(chr : Char) -> Ast
chr('a')
  let Ast
ast : enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, ToJson, Hash, Eq)
Ast = Ast
chr.(self : Ast) -> Ast
opt().(self : Ast, n~ : Int) -> Ast
rep(Int
n~).(self : Ast, other : Ast) -> Ast
seq(Ast
chr.(self : Ast, n~ : Int) -> Ast
rep(Int
n~))
  let Exp
exp = enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare, ToJson)
Exp::(ast : Ast) -> Exp
of_ast(Ast
ast)
  @bench.Bench
b.(self : @bench.Bench, name~ : String, f : () -> Unit, count? : UInt) -> Unit
Run a benchmark in batch mode
bench(String
name="derive", () => Exp
exp.(self : Exp, s : String) -> Bool
matches(String
txt) |> (t : Bool) -> Unit
Evaluates an expression and discards its result. This is useful when you want
to execute an expression for its side effects but don't care about its return
value, or when you want to explicitly indicate that a value is intentionally
unused.
Parameters:

value : The value to be ignored. Can be of any type.
Example:
  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore())
  let Prg
tvm = type Prg Array[Ops] derive(Show, ToJson)
Prg::(ast : Ast) -> Prg
of_ast(Ast
ast)
  @bench.Bench
b.(self : @bench.Bench, name~ : String, f : () -> Unit, count? : UInt) -> Unit
Run a benchmark in batch mode
bench(String
name="thompson", () => Prg
tvm.(self : Prg, data : StringView) -> Bool
matches(String
txt) |> (t : Bool) -> Unit
Evaluates an expression and discards its result. This is useful when you want
to execute an expression for its side effects but don't care about its return
value, or when you want to explicitly indicate that a value is intentionally
unused.
Parameters:

value : The value to be ignored. Can be of any type.
Example:
  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore())
}

模式 (a?){n}a{n} 是回溯引擎中典型的指数爆炸案例。这个模式有 n 种不同的方式来匹配 n 个 'a' 字符，在朴素的实现中会产生指数级的搜索空间。

name     time (mean ± σ)         range (min … max)
derive     41.78 µs ±   0.14 µs    41.61 µs …  42.13 µs  in 10 ×   2359 runs
thompson   12.79 µs ±   0.04 µs    12.74 µs …  12.84 µs  in 10 ×   7815 runs

从基准测试结果可以看出，在这种情况下虚拟机方法明显快于导数方法。导数方法需要频繁分配中间的正则表达式结构，带来了更高的开销和更慢的性能。相比之下，虚拟机执行的是一组固定的指令，一旦双端队列扩展到完整大小后，就很少需要分配新的结构了。

不过，导数方法在理论分析上更简洁。我们可以很容易地证明算法的终止性，因为需要计算的导数数量受到 AST 大小的限制，并且随着 deriv 函数的每次递归调用而严格递减。而虚拟机方法则不同，如果输入的 Prg 包含无限循环，程序可能永远不会终止，这就需要仔细处理线程优先级，以避免无限循环和线程数量的指数级增长。

prettyprinter：使用函数组合解决结构化数据打印问题

2025年9月3日 · 阅读需 9 分钟

结构化数据的打印是编程中常见的问题，尤其是在调试和日志记录时。如何展示复杂的数据结构，并能够根据屏幕宽度调整排版？例如，对于一个数组字面量 [a,b,c] , 我们希望在屏幕宽度足够时打印为一行，而在屏幕宽度不足时自动换行并缩进。传统的解决方案往往依赖于手动处理字符串拼接和维护缩进状态，这样的方式不仅繁琐，而且容易出错。

本篇文章将介绍一种基于函数组合的实用方案——prettyprinter的实现。Prettyprinter 向用户提供了一系列函数，这些函数能够组合成一个描述了打印方式的Doc原语。然后，根据宽度配置和Doc原语生成最终的字符串。函数组合的思路使得用户能够复用已有的代码，声明式地实现数据结构的打印。

SimpleDoc 原语

我们先定义一个SimpleDoc表示4个最简单的原语，来处理最基本的字符串拼接和换行。

enum SimpleDoc {
  SimpleDoc
Empty
  SimpleDoc
Line
  (String) -> SimpleDoc
Text(String
String)
  (SimpleDoc, SimpleDoc) -> SimpleDoc
Cat(enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc, enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc)
}

Empty: 表示空字符串
Line：表示换行
Text(String): 表示一个不包含换行的文本片段
Cat(SimpleDoc, SimpleDoc): 按顺序组合两个 SimpleDoc

按照上面每个原语的定义，我们可以实现一个简单的渲染函数：这个函数使用一个栈来保存待处理的SimpleDoc，逐个将它们转换为字符串。

fn enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc::(doc : SimpleDoc) -> String
render(SimpleDoc
doc : enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc) -> String
String {
  let StringBuilder
buf = type StringBuilder
StringBuilder::(size_hint? : Int) -> StringBuilder
Creates a new string builder with an optional initial capacity hint.
Parameters:

size_hint : An optional initial capacity hint for the internal buffer. If
less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes,
not the size of characters. size_hint may be ignored on some platforms, JS for example.
Returns a new StringBuilder instance with the specified initial capacity.
new()
  let Array[SimpleDoc]
stack = [SimpleDoc
doc]
  while Array[SimpleDoc]
stack.(self : Array[SimpleDoc]) -> SimpleDoc?
Removes the last element from a array and returns it, or None if it is empty.
Example
  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop() is (SimpleDoc) -> SimpleDoc?
Some(SimpleDoc
doc) {
    match SimpleDoc
doc {
      SimpleDoc
Empty => ()
      SimpleDoc
Line => {
        StringBuilder
buf..(self : StringBuilder, str : String) -> Unit
Writes a string to the StringBuilder.
write_string("\n")
      }
      (String) -> SimpleDoc
Text(String
text) => {
        StringBuilder
buf.(self : StringBuilder, str : String) -> Unit
Writes a string to the StringBuilder.
write_string(String
text)
      }
      (SimpleDoc, SimpleDoc) -> SimpleDoc
Cat(SimpleDoc
left, SimpleDoc
right) =>
        Array[SimpleDoc]
stack..(self : Array[SimpleDoc], value : SimpleDoc) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(SimpleDoc
right)..(self : Array[SimpleDoc], value : SimpleDoc) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(SimpleDoc
left)
    }
  }
  StringBuilder
buf.(self : StringBuilder) -> String
Returns the current content of the StringBuilder as a string.
to_string()
}

编写测试，可以看到SimpleDoc的表达能力和 String 相当: Empty 相当于 "" ， Line 相当于 "\n" , Text("a") 相当于 "a" ， Cat(Text("a"), Text("b")) 相当于 "a" + "b" 。

test "simple doc" {
  let SimpleDoc
doc : enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc = (SimpleDoc, SimpleDoc) -> SimpleDoc
Cat((String) -> SimpleDoc
Text("hello"), (SimpleDoc, SimpleDoc) -> SimpleDoc
Cat(SimpleDoc
Line, (String) -> SimpleDoc
Text("world")))
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    SimpleDoc
doc.(doc : SimpleDoc) -> String
render(),
    String
content=(
      #|hello
      #|world
    ),
  )
}

目前它还和String一样无法方便地处理缩进和排版切换。不过，只要再添加三个原语就可以解决这些问题。

ExtendDoc：Nest, Choice, Group

接下来我们在SimpleDoc的基础上，添加三个新的原语Nest、Choice、Group来处理更复杂的打印需求。

enum ExtendDoc {
  ExtendDoc
Empty
  ExtendDoc
Line
  (String) -> ExtendDoc
Text(String
String)
  (ExtendDoc, ExtendDoc) -> ExtendDoc
Cat(enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc, enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc)
  (Int, ExtendDoc) -> ExtendDoc
Nest(Int
Int,enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc)
  (ExtendDoc, ExtendDoc) -> ExtendDoc
Choice(enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc, enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc)
  (ExtendDoc) -> ExtendDoc
Group(enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc)
}

Nest Nest(Int, ExtendDoc) 用于处理缩进。第一个参数表示缩进的空格数，第二个参数表示内部的 ExtendDoc 。当内部的 ExtendDoc 包含 Line 时，render函数将在打印换行的同时追加相应数量的空格。 Nest 嵌套使用时缩进会累加。
Choice Choice(ExtendDoc, ExtendDoc) 保存了两种打印方式。通常第一个参数表示不包含换行更紧凑的布局，第二个参数则是包含 Line 的布局。当render在紧凑模式时，使用第一个布局，否则使用第二个。
Group Group(ExtendDoc) 将ExtendDoc分组，并根据 ExtendDoc 的长度和剩余的空间切换打印 ExtendDoc 时的模式。如果剩余空间足够，则在紧凑模式下打印，否则使用包含换行的布局。

计算所需空间

Group的实现需要计算 ExtendDoc 的空间需求，以便决定是否使用紧凑模式。我们可以为 ExtendDoc 添加一个 space() 方法来计算每个布局片段所需的空间。

let Int
max_space = 9999

fn enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc::(self : ExtendDoc) -> Int
space(ExtendDoc
self : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
Self) -> Int
Int {
  match ExtendDoc
self {
    ExtendDoc
Empty => 0
    ExtendDoc
Line => Int
max_space
    (String) -> ExtendDoc
Text(String
str) => String
str.(self : String) -> Int
Returns the number of UTF-16 code units in the string. Note that this is not
necessarily equal to the number of Unicode characters (code points) in the
string, as some characters may be represented by multiple UTF-16 code units.
Parameters:

string : The string whose length is to be determined.
Returns the number of UTF-16 code units in the string.
Example:
  inspect("hello".length(), content="5")
  inspect("🤣".length(), content="2") // Emoji uses two UTF-16 code units
  inspect("".length(), content="0") // Empty string
length()
    (ExtendDoc, ExtendDoc) -> ExtendDoc
Cat(ExtendDoc
a, ExtendDoc
b) => ExtendDoc
a.(self : ExtendDoc) -> Int
space() (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ ExtendDoc
b.(self : ExtendDoc) -> Int
space()
    (Int, ExtendDoc) -> ExtendDoc
Nest(_, ExtendDoc
a) | (ExtendDoc, ExtendDoc) -> ExtendDoc
Choice(ExtendDoc
a, _) | (ExtendDoc) -> ExtendDoc
Group(ExtendDoc
a) => ExtendDoc
a.(self : ExtendDoc) -> Int
space()
  }
}

对于 Line , 我们假设它总是需要占用无限大的空间。这样如果 Group 内包含 Line，能够保证render处理内部的 ExtendDoc 时不会进入紧凑模式。

实现 ExtendDoc::render

我们在SimpleDoc::render的基础上实现 ExtendDoc::render 。 render在打印完一个子结构后，继续打印后续的结构需要退回到原先的缩进层级，因此需要在stack中额外保存每个待打印的ExtendDoc的两个状态：缩进和是否在紧凑模式。我们还需要维护了一个在render过程中更新的 column 变量，表示当前行的已经使用的字符数，以计算当前行所剩的空间。另外，函数增加了额外的width参数，表示每行的最大宽度限制。

fn enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc::(doc : ExtendDoc, width? : Int) -> String
render(ExtendDoc
doc : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc, Int
width~ : Int
Int = 80) -> String
String {
  let StringBuilder
buf = type StringBuilder
StringBuilder::(size_hint? : Int) -> StringBuilder
Creates a new string builder with an optional initial capacity hint.
Parameters:

size_hint : An optional initial capacity hint for the internal buffer. If
less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes,
not the size of characters. size_hint may be ignored on some platforms, JS for example.
Returns a new StringBuilder instance with the specified initial capacity.
new()
  let Array[(Int, Bool, ExtendDoc)]
stack = [(0, false, ExtendDoc
doc)] // 默认不缩进，非紧凑模式
  let mut Int
column = 0
  while Array[(Int, Bool, ExtendDoc)]
stack.(self : Array[(Int, Bool, ExtendDoc)]) -> (Int, Bool, ExtendDoc)?
Removes the last element from a array and returns it, or None if it is empty.
Example
  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop() is ((Int, Bool, ExtendDoc)) -> (Int, Bool, ExtendDoc)?
Some((Int
indent, Bool
fit, ExtendDoc
doc)) {
    match ExtendDoc
doc {
      ExtendDoc
Empty => ()
      ExtendDoc
Line => {
        StringBuilder
buf..(self : StringBuilder, str : String) -> Unit
Writes a string to the StringBuilder.
write_string("\n")
        // 在换行后打印需要的缩进
        for _ in Int
0..<Int
indent {
          StringBuilder
buf.(self : StringBuilder, str : String) -> Unit
Writes a string to the StringBuilder.
write_string(" ")
        }
        // 重置当前行的字符数
        Int
column = Int
indent
      }
      (String) -> ExtendDoc
Text(String
text) => {
        StringBuilder
buf.(self : StringBuilder, str : String) -> Unit
Writes a string to the StringBuilder.
write_string(String
text)
        // 更新当前行的字符数
        Int
column (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+= String
text.(self : String) -> Int
Returns the number of UTF-16 code units in the string. Note that this is not
necessarily equal to the number of Unicode characters (code points) in the
string, as some characters may be represented by multiple UTF-16 code units.
Parameters:

string : The string whose length is to be determined.
Returns the number of UTF-16 code units in the string.
Example:
  inspect("hello".length(), content="5")
  inspect("🤣".length(), content="2") // Emoji uses two UTF-16 code units
  inspect("".length(), content="0") // Empty string
length()
      }
      (ExtendDoc, ExtendDoc) -> ExtendDoc
Cat(ExtendDoc
left, ExtendDoc
right) =>
        Array[(Int, Bool, ExtendDoc)]
stack..(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((Int
indent, Bool
fit, ExtendDoc
right))..(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((Int
indent, Bool
fit, ExtendDoc
left))
      (Int, ExtendDoc) -> ExtendDoc
Nest(Int
n, ExtendDoc
doc) => Array[(Int, Bool, ExtendDoc)]
stack..(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((Int
indent (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ Int
n, Bool
fit, ExtendDoc
doc)) // 增加缩进
      (ExtendDoc, ExtendDoc) -> ExtendDoc
Choice(ExtendDoc
a, ExtendDoc
b) =>
        Array[(Int, Bool, ExtendDoc)]
stack.(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(if Bool
fit { (Int
indent, Bool
fit, ExtendDoc
a) } else { (Int
indent, Bool
fit, ExtendDoc
b) })
      (ExtendDoc) -> ExtendDoc
Group(ExtendDoc
doc) => {
        // 如果已经在紧凑模式下，直接使用紧凑布局。如果不在紧凑模式下，但是要打印的内容可以放入当前行，则进入紧凑模式。
        let Bool
fit = Bool
fit (Bool, Bool) -> Bool
|| Int
column (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ ExtendDoc
doc.(self : ExtendDoc) -> Int
space() (self_ : Int, other : Int) -> Bool
<= Int
width
        Array[(Int, Bool, ExtendDoc)]
stack.(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((Int
indent, Bool
fit, ExtendDoc
doc))
      }
    }
  }
  StringBuilder
buf.(self : StringBuilder) -> String
Returns the current content of the StringBuilder as a string.
to_string()
}

下面我们尝试用 ExtendDoc 描述一个 (expr) ，并在不同的宽度配置下打印它：

let ExtendDoc
softline : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (ExtendDoc, ExtendDoc) -> ExtendDoc
Choice(ExtendDoc
Empty, ExtendDoc
Line)

impl trait Add {
  add(Self, Self) -> Self
  op_add(Self, Self) -> Self
}
types implementing this trait can use the + operator
Add for enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc with (a : ExtendDoc, b : ExtendDoc) -> ExtendDoc
op_add(ExtendDoc
a, ExtendDoc
b) {
  (ExtendDoc, ExtendDoc) -> ExtendDoc
Cat(ExtendDoc
a, ExtendDoc
b)
}

test "tuple" {
  let ExtendDoc
tuple : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (ExtendDoc) -> ExtendDoc
Group(
    (String) -> ExtendDoc
Text("(") (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ (Int, ExtendDoc) -> ExtendDoc
Nest(2, ExtendDoc
softline (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ (String) -> ExtendDoc
Text("expr")) (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
softline (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ (String) -> ExtendDoc
Text(")"),
  )
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(ExtendDoc
tuple.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=40), String
content="(expr)")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    ExtendDoc
tuple.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=5),
    String
content=(
      #|(
      #|  expr
      #|)
    ),
  )
}

我们先通过组合Empty和Line的方式定义了一个在紧凑模式下不换行的 softline 。render默认以非紧凑模式开始打印，所以我们需要用 Group 将整个表达式包裹起来。这样在宽度足够时，整个表达式会打印为一行，而在宽度不足时会自动换行并缩进。为了减少嵌套的括号，改善可读性，这里给 ExtendDoc 重载了 + 运算符。

组合函数

在prettyprinter的实践中，用户更多地会使用在 ExtendDoc 原语基础之上组合出的函数——例如之前使用过的 softline 。下面将介绍一些实用的函数，帮助我们解决结构化打印的问题。

softline & softbreak

let ExtendDoc
softbreak : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (ExtendDoc, ExtendDoc) -> ExtendDoc
Choice((String) -> ExtendDoc
Text(" "), ExtendDoc
Line)

和 softline 类似，不同的是在紧凑模式下它会加入额外的空格。注意在同一层 Group 中，每个 Choice 都会一致选择紧凑或非紧凑模式。

let ExtendDoc
abc : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (String) -> ExtendDoc
Text("abc")

let ExtendDoc
def : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (String) -> ExtendDoc
Text("def")

let ExtendDoc
ghi : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (String) -> ExtendDoc
Text("ghi")

test "softbreak" {
  let ExtendDoc
doc : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (ExtendDoc) -> ExtendDoc
Group(ExtendDoc
abc (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
softbreak (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
def (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
softbreak (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
ghi)
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(ExtendDoc
doc.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=20), String
content="abc def ghi")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    ExtendDoc
doc.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=10),
    String
content=(
      #|abc
      #|def
      #|ghi
    ),
  )
}

autoline & autobreak

let ExtendDoc
autoline : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (ExtendDoc) -> ExtendDoc
Group(ExtendDoc
softline)

let ExtendDoc
autobreak : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (ExtendDoc) -> ExtendDoc
Group(ExtendDoc
softbreak)

autoline 和 autobreak 实现一种类似于文字编辑器的排版：尽可能多地将内容放进一行内，溢出则换行。

test {
  let ExtendDoc
doc : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (ExtendDoc) -> ExtendDoc
Group(
    ExtendDoc
abc (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
autobreak (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
def (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
autobreak (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
ghi,
  )
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(ExtendDoc
doc.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=10), String
content="abc def ghi")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    ExtendDoc
doc.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=5),
    String
content=(
      #|abc def
      #|ghi
    ),
  )
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    ExtendDoc
doc.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=3),
    String
content=(
      #|abc
      #|def
      #|ghi
    ),
  )
}

sepby

fn (xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby(Array[ExtendDoc]
xs : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc], ExtendDoc
sep : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc) -> enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc {
  match Array[ExtendDoc]
xs {
    [] => ExtendDoc
Empty
    Array[ExtendDoc]
[ExtendDoc
xArray[ExtendDoc]
, .. xs] => ArrayView[ExtendDoc]
xs.(self : ArrayView[ExtendDoc], init~ : ExtendDoc, f : (ExtendDoc, ExtendDoc) -> ExtendDoc) -> ExtendDoc
Fold out values from an ArrayView according to certain rules.
Example
  let sum = [1, 2, 3, 4, 5][:].fold(init=0, (sum, elem) => sum + elem)
  inspect(sum, content="15")
fold(ExtendDoc
init=ExtendDoc
x, (ExtendDoc
a, ExtendDoc
b) => ExtendDoc
a (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
sep (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
b)
  }
}

sepby会在ExtendDoc之间插入分隔符sep。

let ExtendDoc
comma : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc = (String) -> ExtendDoc
Text(",")
test {
  let ExtendDoc
layout = (ExtendDoc) -> ExtendDoc
Group((xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby([ExtendDoc
abc, ExtendDoc
def, ExtendDoc
ghi], ExtendDoc
comma (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
softbreak))
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(ExtendDoc
layout.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=40), String
content="abc, def, ghi")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    ExtendDoc
layout.(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=10),
    String
content=(
      #|abc,
      #|def,
      #|ghi

    ),
  )
}

surround

fn (m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround(ExtendDoc
m : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc, ExtendDoc
l : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc, ExtendDoc
r : enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc) -> enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc {
  ExtendDoc
l (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
m (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
r
}

surround 用于在 ExtendDoc 的两侧添加括号或其他分隔符。

test {
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect((m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround(ExtendDoc
abc, (String) -> ExtendDoc
Text("("), (String) -> ExtendDoc
Text(")")).(doc : ExtendDoc, width? : Int) -> String
render(), String
content="(abc)")
}

打印Json

利用上面定义的函数，我们可以实现一个打印Json的函数。这个函数将递归地处理Json的每个元素，生成相应的布局。

fn (x : Json) -> ExtendDoc
pretty(Json
x : enum Json {
  Null
  True
  False
  Number(Double, repr~ : String?)
  String(String)
  Array(Array[Json])
  Object(Map[String, Json])
}
Json) -> enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc {
  fn (Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list(Array[ExtendDoc]
xs, ExtendDoc
l, ExtendDoc
r) {
    ((Int, ExtendDoc) -> ExtendDoc
Nest(2, ExtendDoc
softline (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ (xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby(Array[ExtendDoc]
xs, ExtendDoc
comma (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
softbreak)) (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ ExtendDoc
softline)
    |> (m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround(ExtendDoc
l, ExtendDoc
r)
    |> (ExtendDoc) -> ExtendDoc
Group
  }

  match Json
x {
    (Array[Json]) -> Json
Array(Array[Json]
elems) => {
      let Array[ExtendDoc]
elems = Array[Json]
elems.(self : Array[Json]) -> Iter[Json]
Creates an iterator over the elements of the array.
Parameters:

array : The array to create an iterator from.
Returns an iterator that yields each element of the array in order.
Example:
  let arr = [1, 2, 3]
  let mut sum = 0
  arr.iter().each((x) => { sum = sum + x })
  inspect(sum, content="6")
iter().(self : Iter[Json], f : (Json) -> ExtendDoc) -> Iter[ExtendDoc]
Transforms the elements of the iterator using a mapping function.
Type Parameters

T: The type of the elements in the iterator.
R: The type of the transformed elements.
Arguments

self - The input iterator.
f - The mapping function that transforms each element of the iterator.
Returns
A new iterator that contains the transformed elements.
map((x : Json) -> ExtendDoc
pretty).(self : Iter[ExtendDoc]) -> Array[ExtendDoc]
Collects the elements of the iterator into an array.
collect()
      (Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list(Array[ExtendDoc]
elems, (String) -> ExtendDoc
Text("["), (String) -> ExtendDoc
Text("]"))
    }
    (Map[String, Json]) -> Json
Object(Map[String, Json]
pairs) => {
      let Array[ExtendDoc]
pairs = Map[String, Json]
pairs
        .(self : Map[String, Json]) -> Iter[(String, Json)]
Returns the iterator of the hash map, provide elements in the order of insertion.
iter()
        .(self : Iter[(String, Json)], f : ((String, Json)) -> ExtendDoc) -> Iter[ExtendDoc]
Transforms the elements of the iterator using a mapping function.
Type Parameters

T: The type of the elements in the iterator.
R: The type of the transformed elements.
Arguments

self - The input iterator.
f - The mapping function that transforms each element of the iterator.
Returns
A new iterator that contains the transformed elements.
map((String, Json)
p => (ExtendDoc) -> ExtendDoc
Group((String) -> ExtendDoc
Text((String, Json)
p.String
0.(self : String) -> String
Returns a valid MoonBit string literal representation of a string,
add quotes and escape special characters.
Examples
  let str = "Hello \n"
  inspect(str.to_string(), content="Hello \n")
  inspect(str.escape(), content="\"Hello \\n\"")
escape()) (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ (String) -> ExtendDoc
Text(": ") (self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+ (x : Json) -> ExtendDoc
pretty((String, Json)
p.Json
1)))
        .(self : Iter[ExtendDoc]) -> Array[ExtendDoc]
Collects the elements of the iterator into an array.
collect()
      (Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list(Array[ExtendDoc]
pairs, (String) -> ExtendDoc
Text("{"), (String) -> ExtendDoc
Text("}"))
    }
    (String) -> Json
String(String
s) => (String) -> ExtendDoc
Text(String
s.(self : String) -> String
Returns a valid MoonBit string literal representation of a string,
add quotes and escape special characters.
Examples
  let str = "Hello \n"
  inspect(str.to_string(), content="Hello \n")
  inspect(str.escape(), content="\"Hello \\n\"")
escape())
    (Double, repr~ : String?) -> Json
Number(Double
i) => (String) -> ExtendDoc
Text(Double
i.(self : Double) -> String
Converts a double-precision floating-point number to its string
representation.
Parameters:

self: The double-precision floating-point number to be converted.
Returns a string representation of the double-precision floating-point
number.
Example:
  inspect(42.0.to_string(), content="42")
  inspect(3.14159.to_string(), content="3.14159")
  inspect((-0.0).to_string(), content="0")
  inspect(@double.not_a_number.to_string(), content="NaN")
to_string())
    Json
False => (String) -> ExtendDoc
Text("false")
    Json
True => (String) -> ExtendDoc
Text("true")
    Json
Null => (String) -> ExtendDoc
Text("null")
  }
}

可以看到在不同的打印宽度下，Json的排版会自动调整。

test {
  let Json
json : enum Json {
  Null
  True
  False
  Number(Double, repr~ : String?)
  String(String)
  Array(Array[Json])
  Object(Map[String, Json])
}
Json = {
    "key1": "string",
    "key2": [12345, 67890],
    "key3": [
      { "field1": 1, "field2": 2 },
      { "field1": 1, "field2": 2 },
      { "field1": [1, 2], "field2": 2 },
    ],
  }
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    (x : Json) -> ExtendDoc
pretty(Json
json).(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=80),
    String
content=(
      #|{
      #|  "key1": "string",
      #|  "key2": [12345, 67890],
      #|  "key3": [
      #|    {"field1": 1, "field2": 2},
      #|    {"field1": 1, "field2": 2},
      #|    {"field1": [1, 2], "field2": 2}
      #|  ]
      #|}
    ),
  )
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    (x : Json) -> ExtendDoc
pretty(Json
json).(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=30),
    String
content=(
      #|{
      #|  "key1": "string",
      #|  "key2": [12345, 67890],
      #|  "key3": [
      #|    {"field1": 1, "field2": 2},
      #|    {"field1": 1, "field2": 2},
      #|    {
      #|      "field1": [1, 2],
      #|      "field2": 2
      #|    }
      #|  ]
      #|}
    ),
  )
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(
    (x : Json) -> ExtendDoc
pretty(Json
json).(doc : ExtendDoc, width~ : Int) -> String
render(Int
width=20),
    String
content=(
      #|{
      #|  "key1": "string",
      #|  "key2": [
      #|    12345,
      #|    67890
      #|  ],
      #|  "key3": [
      #|    {
      #|      "field1": 1,
      #|      "field2": 2
      #|    },
      #|    {
      #|      "field1": 1,
      #|      "field2": 2
      #|    },
      #|    {
      #|      "field1": [
      #|        1,
      #|        2
      #|      ],
      #|      "field2": 2
      #|    }
      #|  ]
      #|}
    ),
  )
}

总结

本文介绍了如何简单实现一个prettyprinter，使用函数组合的方式来处理结构化数据的打印。通过定义一系列原语和组合函数，我们可以灵活地控制打印格式，并根据屏幕宽度自动调整布局。

当前的实现还可以进一步优化，例如通过记忆化space的计算来提高性能。ExtendDoc::render函数可以增加一个ribbon参数，分别统计当前行的空格和其他文本字数，并且在Group的紧凑模式判断中增加额外的条件，来控制每行的信息密度。另外，还可以增加更多的原语来实现悬挂缩进、最小换行数量等功能。对于更多的设计和实现细节感兴趣的读者，可以参考A prettier printer - Philip Wadler以及Haskell、OCaml等语言的prettyprinter实现。

Mini-adapton: 用 MoonBit 实现增量计算

2025年8月27日 · 阅读需 10 分钟

介绍

让我们先用一个类似 excel 的例子感受一下增量计算长什么样子. 首先, 定义一个这样的依赖图:

在这个图中, t1 的值通过 n1 + n2 计算得到, t2 的值通过 t1 + n3 计算得到.

当我们想得到 t2 的值时, 该图定义的计算将被执行: 首先通过 n1 + n2 算出 t1, 再通过 t1 + n3 算出 t2. 这个过程和非增量计算是相同的.

但当我们开始改变n1, n2 或 n3 的值时, 事情就不一样了. 比如说我们想将 n1 和 n2 的值互换, 再得到 t2 的值. 在非增量计算中, t1 和 t2 都将被重新计算一遍, 但实际上 t2 是不需要被重新计算的, 因为它依赖的两个值 t1 和 n3 都没有改变 (将 n1 和 n2 的值互换不会改变 t1 的值).

下面的代码实现了我们刚刚举的例子. 我们使用 Cell::new 来定义 n1, n2 和 n3 这些不需要计算的东西, 使用 Thunk::new 来定义 t1 和 t2 这样需要计算的东西.

test {
  // a counter to record the times of t2's computation
  let mut Int
cnt = 0
  // start define the graph
  let Cell[Int]
n1 = struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell::(value : Int) -> Cell[Int]
new(1)
  let Cell[Int]
n2 = struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell::(value : Int) -> Cell[Int]
new(2)
  let Cell[Int]
n3 = struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell::(value : Int) -> Cell[Int]
new(3)
  let Thunk[Int]
t1 = struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk::(thunk : () -> Int) -> Thunk[Int]
new(fn() {
    Cell[Int]
n1.(self : Cell[Int]) -> Int
get() (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ Cell[Int]
n2.(self : Cell[Int]) -> Int
get()
  })
  let Thunk[Int]
t2 = struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk::(thunk : () -> Int) -> Thunk[Int]
new(fn() {
    Int
cnt (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+= 1
    Thunk[Int]
t1.(self : Thunk[Int]) -> Int
get() (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ Cell[Int]
n3.(self : Cell[Int]) -> Int
get()
  })
  // get the value of t2
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Thunk[Int]
t2.(self : Thunk[Int]) -> Int
get(), String
content="6")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Int
cnt, String
content="1")
  // swap value of n1 and n2
  Cell[Int]
n1.(self : Cell[Int], new_value : Int) -> Unit
set(2)
  Cell[Int]
n2.(self : Cell[Int], new_value : Int) -> Unit
set(1)
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Thunk[Int]
t2.(self : Thunk[Int]) -> Int
get(), String
content="6")
  // t2 does not recompute
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Int
cnt, String
content="1")
}

在这篇文章中, 我们将介绍如何在 MoonBit 中实现一个增量计算库. 这个库的 API 就是我们上面例子中出现的那些:

Cell::new
Cell::get
Cell::set
Thunk::new
Thunk::get

问题分析和解法

要实现这个库, 我们主要有三个问题需要解决:

如何在运行时构建依赖图

作为一个使用 MoonBit 实现的库, 没有简单方法让我们可以静态地构建依赖图, 因为 MoonBit 目前还不支持任何元编程的机制. 因此我们需要动态地把依赖图构建出来. 事实上, 我们关心的只是哪些 thunk 或 cell 被另一个 thunk 依赖了, 所以一个不错的构建依赖图的时机就是在用户调用 Thunk::get 的时候. 比如在上面的例子中:

let n1 = Cell::new(1)
let n2 = Cell::new(2)
let n3 = Cell::new(3)
let t1 = Thunk::new(fn() { n1.get() + n2.get() })
let t2 = Thunk::new(fn() { t1.get() + n3.get() })
t2.get()

当用户调用 t2.get() 时, 我们在运行时会知道 t1.get() 和 n3.get() 在其中也被调用了. 因此 t1 和 n3 是 t2 的依赖, 并且我们可以构建一个这样的图:

同样的过程也会在 t1.get() 被调用时发生.

所以计划是这样的:

我们定义一个栈来记录我们当前在获得哪个 thunk 的值. 在这里使用栈的原因是, 我们事实上是在尝试记录每个 get 的调用栈.
当我们调用 get 时, 将其标记为栈顶 thunk 的依赖, 如果它是一个 thunk, 再把它压栈.
当一个 thunk 的 get 结束时, 将它出栈.

让我们看看上面那个例子在这个算法下的过程是什么样子的:

当我们调用 t2.get 时, 将 t2 压栈.
当我们在 t2.get 中调用 t1.get 时, 将 t1 记为 t2 的依赖, 并将 t1 压栈.
当我们在 t1.get 中调用 n1.get 时, 将 n1 记为 t1 的依赖
相同的过程发生在 n2 身上.
当 t1.get 结束时, 将 t1 出栈.
当我们调用 n3.get 时, 将 n3 记为 t2 的依赖.

除了这些从父依赖到子依赖的边之外, 我们最好也记录一个从子依赖到父依赖的边, 方便后面我们在这个图上反向便利.

在接下来的代码中, 我们将使用 outgoing_edges 指代从父依赖到子依赖的边, 使用 incoming_edges 指代中子依赖到父依赖的边.

如何标记过时的节点

当我们调用 Cell::set 时, 该节点本身和所有依赖它的节点都应该被标记为过时的. 这将在后面作为判断一个 thunk 是否需要重新计算的标准之一. 这基本上是一个从图的叶子节点向后遍历的过程. 我们可以用这样的伪 MoonBit 代码表示这个算法:

fn dirty(node: Node) -> Unit {
  for n in node.incoming_edges {
    n.set_dirty(true)
    dirty(node)
  }
}

如何决定一个 thunk 需要被重新计算

当我们调用 Thunk::get 时, 我们需要决定是否它需要被重新计算. 但只用我们在上一节描述的方法是不够的. 如果我们只使用是否过时这一个标准进行判断, 势必会有不需要的计算发生. 比如我们在一开始给出的例子:

n1.set(2)
n2.set(1)
inspect(t2.get(), content="6")

当我们调换 n1 和 n2 的值时, n1, n2, t1 和 t2 都应该被标记为过时, 但当我们调用 t2.get 时, 其实没有必要重新计算 t2, 因为 t1 的值并没有改变.

这提醒我们除了过时之外, 我们还要考虑依赖的值是否和它上一次的值一样. 如果一个节点既是过时的, 并且它的依赖中存在一个值和上一次不同, 那么它应该被重新计算.

我们可以用下面的伪 MoonBit 代码描述这个算法:

fn propagate(self: Node) -> Unit {
  // 当一个节点过时了, 它可能需要被重新计算
  if self.is_dirty() {
    // 重新计算之后, 它将不在是过时的
    self.set_dirty(false)
    for dependency in self.outgoing_edges() {
      // 递归地重新计算每个依赖
      dependency.propagate()
      // 如果一个依赖的值改变了, 这个节点需要被重新计算
      if dependency.is_changed() {
        // 移除所有的 outgoing_edges, 它们将在被计算时重新构建
        self.outgoing_edges().clear()
        self.evaluate()
        return
      }
    }
  }
}

实现

基于上面描述的代码, 实现是比较直观的.

首先, 我们先定义 Cell:

struct Cell[A] {
  mut Bool
is_dirty : Bool
Bool
  mut A
value : type parameter A
A
  mut Bool
is_changed : Bool
Bool
  Array[&Node]
incoming_edges : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[&trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node]
}

由于 Cell 只会是依赖图中的叶子节点, 所以它没有 outgoing_edges. 这里出现的特征 Node 是用来抽象依赖图中的节点的.

接着, 我们定义 Thunk:

struct Thunk[A] {
  mut Bool
is_dirty : Bool
Bool
  mut A?
value : type parameter A
A?
  mut Bool
is_changed : Bool
Bool
  () -> A
thunk : () -> type parameter A
A
  Array[&Node]
incoming_edges : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[&trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node]
  Array[&Node]
outgoing_edges : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[&trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node]
}

Thunk 的值是可选的, 因为它只有在我们第一次调用 Thunk::get 之后才会存在.

我们可以很简单地给这两个类型实现 new:

fn[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell::(value : A) -> Cell[A]
new(A
value : type parameter A
A) -> struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A] {
  struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell::{
    Bool
is_changed: false,
    A
value,
    Array[&Node]
incoming_edges: [],
    Bool
is_dirty: false,
  }
}

fn[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk::(thunk : () -> A) -> Thunk[A]
new(() -> A
thunk : () -> type parameter A
A) -> struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk[type parameter A
A] {
  struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk::{
    A?
value: A?
None,
    Bool
is_changed: false,
    () -> A
thunk,
    Array[&Node]
incoming_edges: [],
    Array[&Node]
outgoing_edges: [],
    Bool
is_dirty: false,
  }
}

Thunk 和 Cell 是依赖图的两种节点, 我们可以使用一个特征 Node 来抽象它们:

trait trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node {
  (Self) -> Bool
is_dirty(type parameter Self
Self) -> Bool
Bool
  (Self, Bool) -> Unit
set_dirty(type parameter Self
Self, Bool
Bool) -> Unit
Unit
  (Self) -> Array[&Node]
incoming_edges(type parameter Self
Self) -> type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[&trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node]
  (Self) -> Array[&Node]
outgoing_edges(type parameter Self
Self) -> type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[&trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node]
  (Self) -> Bool
is_changed(type parameter Self
Self) -> Bool
Bool
  (Self) -> Unit
evaluate(type parameter Self
Self) -> Unit
Unit
}

为两个类型实现这个特征:

impl[A] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A] with (self : Cell[A]) -> Array[&Node]
incoming_edges(Cell[A]
self) {
  Cell[A]
self.Array[&Node]
incoming_edges
}

impl[A] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A] with (_self : Cell[A]) -> Array[&Node]
outgoing_edges(Cell[A]
_self) {
  []
}

impl[A] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A] with (self : Cell[A]) -> Bool
is_dirty(Cell[A]
self) {
  Cell[A]
self.Bool
is_dirty
}

impl[A] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A] with (self : Cell[A], new_dirty : Bool) -> Unit
set_dirty(Cell[A]
self, Bool
new_dirty) {
  Cell[A]
self.Bool
is_dirty = Bool
new_dirty
}

impl[A] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A] with (self : Cell[A]) -> Bool
is_changed(Cell[A]
self) {
  Cell[A]
self.Bool
is_changed
}

impl[A] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A] with (_self : Cell[A]) -> Unit
evaluate(Cell[A]
_self) {
  ()
}

impl[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk[type parameter A
A] with (self : Thunk[A]) -> Bool
is_changed(Thunk[A]
self) {
  Thunk[A]
self.Bool
is_changed
}

impl[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk[type parameter A
A] with (self : Thunk[A]) -> Array[&Node]
outgoing_edges(Thunk[A]
self) {
  Thunk[A]
self.Array[&Node]
outgoing_edges
}

impl[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk[type parameter A
A] with (self : Thunk[A]) -> Array[&Node]
incoming_edges(Thunk[A]
self) {
  Thunk[A]
self.Array[&Node]
incoming_edges
}

impl[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk[type parameter A
A] with (self : Thunk[A]) -> Bool
is_dirty(Thunk[A]
self) {
  Thunk[A]
self.Bool
is_dirty
}

impl[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk[type parameter A
A] with (self : Thunk[A], new_dirty : Bool) -> Unit
set_dirty(Thunk[A]
self, Bool
new_dirty) {
  Thunk[A]
self.Bool
is_dirty = Bool
new_dirty
}

impl[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node for struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk[type parameter A
A] with (self : Thunk[A]) -> Unit
evaluate(Thunk[A]
self) {
  Array[&Node]
node_stack.(self : Array[&Node], value : &Node) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Thunk[A]
self)
  let A
value = (Thunk[A]
self.() -> A
thunk)()
  Thunk[A]
self.Bool
is_changed = match Thunk[A]
self.A?
value {
    A?
None => true
    (A) -> A?
Some(A
v) => A
v (x : A, y : A) -> Bool
!= A
value
  }
  Thunk[A]
self.A?
value = (A) -> A?
Some(A
value)
  Array[&Node]
node_stack.(self : Array[&Node]) -> &Node
Removes and returns the last element from the array.
Parameters:

array : The array from which to remove and return the last element.
Returns the last element of the array before removal.
Example:
  let arr = [1, 2, 3]
  inspect(arr.unsafe_pop(), content="3")
  inspect(arr, content="[1, 2]")
unsafe_pop() |> (t : &Node) -> Unit
Evaluates an expression and discards its result. This is useful when you want
to execute an expression for its side effects but don't care about its return
value, or when you want to explicitly indicate that a value is intentionally
unused.
Parameters:

value : The value to be ignored. Can be of any type.
Example:
  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
}

这里唯一复杂的实现是 Thunk 的 evaluate. 这里我们需要先把这个 thunk 推到栈顶用于后面的依赖记录. node_stack 的定义如下:

let Array[&Node]
node_stack : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[&trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node] = []

然后做真正的计算, 并且把计算得到的值和上一个值做比较以更新 self.is_changed. is_changed 会在后面帮助我们判断是否需要重新计算一个 thunk.

dirty 和 propagate 的实现几乎和上面的伪代码相同:

fn trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node::(self : &Node) -> Unit
dirty(&Node
self : &trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node) -> Unit
Unit {
  for &Node
dependent in &Node
self.(&Node) -> Array[&Node]
incoming_edges() {
    if (x : Bool) -> Bool
Performs logical negation on a boolean value.
Parameters:

value : The boolean value to negate.
Returns the logical NOT of the input value: true if the input is false,
and false if the input is true.
Example:
  inspect(not(true), content="false")
  inspect(not(false), content="true")
not(&Node
dependent.(&Node) -> Bool
is_dirty()) {
      &Node
dependent.(&Node, Bool) -> Unit
set_dirty(true)
      &Node
dependent.(self : &Node) -> Unit
dirty()
    }
  }
}

fn trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node::(self : &Node) -> Unit
propagate(&Node
self : &trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node) -> Unit
Unit {
  if &Node
self.(&Node) -> Bool
is_dirty() {
    &Node
self.(&Node, Bool) -> Unit
set_dirty(false)
    for &Node
dependency in &Node
self.(&Node) -> Array[&Node]
outgoing_edges() {
      &Node
dependency.(self : &Node) -> Unit
propagate()
      if &Node
dependency.(&Node) -> Bool
is_changed() {
        &Node
self.(&Node) -> Array[&Node]
outgoing_edges().(self : Array[&Node]) -> Unit
Clears the array, removing all values.
This method has no effect on the allocated capacity of the array, only setting the length to 0.
Example
  let v = [3, 4, 5]
  v.clear()
  assert_eq(v.length(), 0)
clear()
        &Node
self.(&Node) -> Unit
evaluate()
        return
      }
    }
  }
}

有了这些函数的帮助, 最主要的三个 API: Cell::get, Cell::set 和 Thunk::get 实现起来就比较简单了.

为了得到一个 cell 的值, 我们直接返回结构体的 value 字段即可. 但在此之前, 如果它是在一个 Thunk::get 中被调用的, 我们要先把他记录为依赖.

fn[A] struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell::(self : Cell[A]) -> A
get(Cell[A]
self : struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A]) -> type parameter A
A {
  if Array[&Node]
node_stack.(self : Array[&Node]) -> &Node?
Returns the last element of the array, or None if the array is empty.
Parameters:

array : The array to get the last element from.
Returns an optional value containing the last element of the array. The
result is None if the array is empty, or Some(x) where x is the last
element of the array.
Example:
  let arr = [1, 2, 3]
  inspect(arr.last(), content="Some(3)")
  let empty : Array[Int] = []
  inspect(empty.last(), content="None")
last() is (&Node) -> &Node?
Some(&Node
target) {
    &Node
target.(&Node) -> Array[&Node]
outgoing_edges().(self : Array[&Node], value : &Node) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Cell[A]
self)
    Cell[A]
self.Array[&Node]
incoming_edges.(self : Array[&Node], value : &Node) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(&Node
target)
  }
  Cell[A]
self.A
value
}

当我们更改一个 cell 的值时, 我们需要先确保 is_changed 和 dirty 这两个状态被正确地更新了, 再将它的每一个父依赖标记为过时.

fn[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell::(self : Cell[A], new_value : A) -> Unit
set(Cell[A]
self : struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell[type parameter A
A], A
new_value : type parameter A
A) -> Unit
Unit {
  if Cell[A]
self.A
value (x : A, y : A) -> Bool
!= A
new_value {
    Cell[A]
self.Bool
is_changed = true
    Cell[A]
self.A
value = A
new_value
    Cell[A]
self.(self : Cell[A], new_dirty : Bool) -> Unit
set_dirty(true)
    trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node::(&Node) -> Unit
dirty(Cell[A]
self)
  }
}

和 Cell::get 类似, 在实现 Thunk::get 时我们需要先将 self 记录为依赖. 之后我们模式匹配 self.value, 如果它是 None, 这意味着这是第一次用户尝试计算这个 thunk 地值, 我们可以简单地直接计算它; 如果它是 Some, 我们需要使用 propagate 来确保我们只重新计算那些需要的 thunk.

fn[A : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq] struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk::(self : Thunk[A]) -> A
get(Thunk[A]
self : struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk[type parameter A
A]) -> type parameter A
A {
  if Array[&Node]
node_stack.(self : Array[&Node]) -> &Node?
Returns the last element of the array, or None if the array is empty.
Parameters:

array : The array to get the last element from.
Returns an optional value containing the last element of the array. The
result is None if the array is empty, or Some(x) where x is the last
element of the array.
Example:
  let arr = [1, 2, 3]
  inspect(arr.last(), content="Some(3)")
  let empty : Array[Int] = []
  inspect(empty.last(), content="None")
last() is (&Node) -> &Node?
Some(&Node
target) {
    &Node
target.(&Node) -> Array[&Node]
outgoing_edges().(self : Array[&Node], value : &Node) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Thunk[A]
self)
    Thunk[A]
self.Array[&Node]
incoming_edges.(self : Array[&Node], value : &Node) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(&Node
target)
  }
  match Thunk[A]
self.A?
value {
    A?
None => Thunk[A]
self.(self : Thunk[A]) -> Unit
evaluate()
    (A) -> A?
Some(_) => trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node::(&Node) -> Unit
propagate(Thunk[A]
self)
  }
  Thunk[A]
self.A?
value.(self : A?) -> A
Extract the value in Some.
If the value is None, it throws a panic.
unwrap()
}

参考

Adapton: Composable, demand-driven incremental computation, PLDI 2014 adapton 的原论文
illusory0x0/adapton.mbt adapton 库的 MoonBit 实现

MoonBit与Python集成指南

2025年8月19日 · 阅读需 13 分钟

引言

Python，以其简洁的语法和庞大的生态系统，已成为当今最受欢迎的编程语言之一。然而，围绕其性能瓶颈和动态类型系统在大型项目中的维护性问题的讨论也从未停止。为了解决这些挑战，开发者社区探索了多种优化路径。

MoonBit 官方推出的 python.mbt 工具为此提供了一个新的视角。它允许开发者在 MoonBit 环境中直接调用 Python 代码。这种结合旨在融合 MoonBit 的静态类型安全、高性能潜力与 Python 成熟的生态系统。通过 python.mbt，开发者可以在享受 Python 丰富库函数的同时，利用 MoonBit 的静态分析能力、现代化的构建与测试工具，为构建大规模、高性能的系统级软件提供可能。

本文旨在深入探讨 python.mbt 的工作原理，并提供一份实践指南。本文将解答一些常见问题，例如：python.mbt 如何工作？它是否会因为增加了一个中间层而比原生 Python 更慢？相较于 C++ 的 pybind11 或 Rust 的 PyO3 等现有工具，python.mbt 的优势何在？要回答这些问题，我们首先需要理解 Python 解释器的基本工作流程。

Python 解释器的工作原理

Python 解释器执行代码主要经历三个阶段：

解析阶段 (Parsing) ：此阶段包含词法分析和语法分析。解释器将人类可读的 Python 源代码分解成一个个标记（Token），然后根据语法规则将这些标记组织成一个树形结构，即抽象语法树（AST）。

例如，对于以下 Python 代码：

def add(x, y):
  return x + y

a = add(1, 2)
print(a)

我们可以使用 Python 的 ast 模块来查看其生成的 AST 结构：

Module(
    body=[
        FunctionDef(
            name='add',
            args=arguments(
                args=[
                    arg(arg='x'),
                    arg(arg='y')]),
            body=[
                Return(
                    value=BinOp(
                        left=Name(id='x', ctx=Load()),
                        op=Add(),
                        right=Name(id='y', ctx=Load())))]),
        Assign(
            targets=[
                Name(id='a', ctx=Store())],
            value=Call(
                func=Name(id='add', ctx=Load()),
                args=[
                    Constant(value=1),
                    Constant(value=2)])),
        Expr(
            value=Call(
                func=Name(id='print', ctx=Load()),
                args=[
                    Name(id='a', ctx=Load())]))])

编译阶段 (Compilation) ：接下来，Python 解释器会将 AST 编译成更低级、更线性的中间表示，即字节码（Bytecode）。这是一种平台无关的指令集，专为 Python 虚拟机（PVM）设计。

利用 Python 的 dis 模块，我们可以查看上述代码对应的字节码：

  2           LOAD_CONST               0 (<code object add>)
              MAKE_FUNCTION
              STORE_NAME               0 (add)

  5           LOAD_NAME                0 (add)
              PUSH_NULL
              LOAD_CONST               1 (1)
              LOAD_CONST               2 (2)
              CALL                     2
              STORE_NAME               1 (a)

  6           LOAD_NAME                2 (print)
              PUSH_NULL
              LOAD_NAME                1 (a)
              CALL                     1
              POP_TOP
              RETURN_CONST             3 (None)

执行阶段 (Execution) ：最后，Python 虚拟机（PVM）会逐条执行字节码指令。每条指令都对应 CPython 解释器底层的一个 C 函数调用。例如，LOAD_NAME 会查找变量，BINARY_OP 会执行二元运算。正是这个逐条解释执行的过程，构成了 Python 性能开销的主要来源。一次简单的 1 + 2 运算，背后需要经历整个解析、编译和虚拟机执行的复杂流程。

了解这个流程，有助于我们理解 Python 性能优化的基本思路，以及 python.mbt 的设计哲学。

优化 Python 性能的路径

目前，提升 Python 程序性能主要有两种主流方法：

即时编译（JIT） 。像 PyPy 这样的项目，通过分析正在运行的程序，将频繁执行的"热点"字节码编译成高度优化的本地机器码，从而绕过 PVM 的解释执行，大幅提升计算密集型任务的速度。然而，JIT 并非万能药，它无法解决 Python 动态类型语言的固有问题，例如在大型项目中难以进行有效的静态分析，这给软件维护带来了挑战。
原生扩展。开发者可以使用 C++（借助 pybind11）或 Rust（借助 PyO3）等语言直接调用Python功能，或者用这些语言来编写性能关键模块，然后从 Python 中调用。这种方法可以获得接近原生的性能，但它要求开发者同时精通 Python 和一门复杂的系统级语言，学习曲线陡峭，对大多数 Python 程序员来说门槛较高。

python.mbt 也是一种原生扩展。但相比较于C++和Rust等语言，它试图在性能、易用性和工程化能力之间找到一个新的平衡点，更强调在MoonBit语言中直接使用Python功能。

高性能核心：MoonBit 是一门静态类型的编译型语言，其代码可以被高效地编译成原生机器码。开发者可以将计算密集型逻辑用 MoonBit 实现，从根本上获得高性能。
无缝的 Python 调用：python.mbt 直接与 CPython 的 C-API 交互，调用 Python 模块和函数。这意味着调用开销被最小化，绕过了 Python 的解析和编译阶段，直达虚拟机执行层。
更平缓的学习曲线：相较于 C++ 和 Rust，MoonBit 的语法设计更加现代化和简洁，并拥有完善的函数式编程支持、文档系统、单元测试和静态分析工具，对习惯于 Python 的开发者更加友好。
改善的工程化与 AI 协作：MoonBit 的强类型系统和清晰的接口定义，使得代码意图更加明确，更易于被静态分析工具和 AI 辅助编程工具理解。这有助于在大型项目中维护代码质量，并提升与 AI 协作编码的效率和准确性。

在 MoonBit 中使用已封装的 Python 库

为了方便开发者使用，MoonBit 官方会在构建系统和IDE成熟后对主流 Python 库进行封装。封装完成后，用户可以像导入普通 MoonBit 包一样，在项目中使用这些 Python 库。下面以 matplotlib 绘图库为例。

首先，在你的项目根目录的 moon.pkg.json 或终端中添加 matplotlib 依赖：

moon update
moon add Kaida-Amethyst/matplotlib

然后，在要使用该库的子包的 moon.pkg.json 中声明导入。这里，我们遵循 Python 的惯例，为其设置一个别名 plt：

{
  "import": [
    {
      "path": "Kaida-Amethyst/matplotlib",
      "alias": "plt"
    }
  ]
}

完成配置后，便可以在 MoonBit 代码中调用 matplotlib 进行绘图：

let (Double) -> Double
sin : (Double
Double) -> Double
Double = (x : Double) -> Double
Calculates the sine of a number in radians. Handles special cases and edge
conditions according to IEEE 754 standards.
Parameters:

x : The angle in radians for which to calculate the sine.
Returns the sine of the angle x.
Example:
inspect(@math.sin(0.0), content="0")
inspect(@math.sin(1.570796326794897), content="1") // pi / 2
inspect(@math.sin(2.0), content="0.9092974268256817")
inspect(@math.sin(-5.0), content="0.9589242746631385")
inspect(@math.sin(31415926535897.9323846), content="0.0012091232715481885")
inspect(@math.sin(@double.not_a_number), content="NaN")
inspect(@math.sin(@double.infinity), content="NaN")
inspect(@math.sin(@double.neg_infinity), content="NaN")
@math.sin

fn main {
  let Array[Double]
x = type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array::(Int, (Int) -> Double) -> Array[Double]
Creates a new array of the specified length, where each element is
initialized using an index-based initialization function.
Parameters:

length : The length of the new array. If length is less than or equal
to 0, returns an empty array.
initializer : A function that takes an index (starting from 0) and
returns a value of type T. This function is called for each index to
initialize the corresponding element.
Returns a new array of type Array[T] with the specified length, where each
element is initialized using the provided function.
Example:
  let arr = Array::makei(3, i => i * 2)
  inspect(arr, content="[0, 2, 4]")
makei(100, fn(Int
i) { Int
i.(self : Int) -> Double
Converts a 32-bit integer to a double-precision floating-point number. The
conversion preserves the exact value since all integers in the range of Int
can be represented exactly as Double values.
Parameters:

self : The 32-bit integer to be converted.
Returns a double-precision floating-point number that represents the same
numerical value as the input integer.
Example:
  let n = 42
  inspect(n.to_double(), content="42")
  let neg = -42
  inspect(neg.to_double(), content="-42")
to_double() (self : Double, other : Double) -> Double
Multiplies two double-precision floating-point numbers. This is the
implementation of the * operator for Double type.
Parameters:

self : The first double-precision floating-point operand.
other : The second double-precision floating-point operand.
Returns a new double-precision floating-point number representing the product
of the two operands. Special cases follow IEEE 754 standard:

If either operand is NaN, returns NaN
If one operand is infinity and the other is zero, returns NaN
If one operand is infinity and the other is a non-zero finite number,
returns infinity with the appropriate sign
If both operands are infinity, returns infinity with the appropriate sign
Example:
  inspect(2.5 * 2.0, content="5")
  inspect(-2.0 * 3.0, content="-6")
  let nan = 0.0 / 0.0 // NaN
  inspect(nan * 1.0, content="NaN")
* 0.1 })
  let Array[Double]
y = Array[Double]
x.(self : Array[Double], f : (Double) -> Double) -> Array[Double]
Maps a function over the elements of the array.
Example
  let v = [3, 4, 5]
  let v2 = v.map((x) => {x + 1})
  assert_eq(v2, [4, 5, 6])
map((Double) -> Double
sin)

  // 为保证类型安全，封装后的 subplots 接口总是返回一个固定类型的元组。
  // 这避免了 Python 中根据参数返回不同类型对象的动态行为。
  let (_, Unit
axes) = (Int, Int) -> (Unit, Unit)
plt::subplots(1, 1)

  // 使用 .. 级联调用语法
  Unit
axes[0(Int) -> Unit
][0]
  ..(Array[Double], Array[Double], Unit, Unit, Int) -> Unit
plot(Array[Double]
x, Array[Double]
y, Unit
color = Unit
Green, Unit
linestyle = Unit
Dashed, Int
linewidth = 2)
  ..(String) -> Unit
set_title("Sine of x")
  ..(String) -> Unit
set_xlabel("x")
  ..(String) -> Unit
set_ylabel("sin(x)")

  () -> Unit
@plt.show()
}

目前，在 macOS 和 Linux 环境下，MoonBit 的构建系统可以自动处理依赖。在 Windows 上，用户可能需要手动安装 C 编译器并配置 Python 环境。未来的 MoonBit IDE 将致力于简化这一过程。

在 MoonBit 中使用未封装的 Python 模块

Python 生态浩如烟海，即使现在有了AI技术，完全依赖官方封装也并不现实。幸运的是，我们可以利用 python.mbt 的核心功能直接与任何 Python 模块交互。下面，我们以 Python 标准库中，一个简单的的 time 模块为例，演示这一过程。

引入 python.mbt

首先，确保你的 MoonBit 工具链是最新版本，然后添加 python.mbt 依赖：

moon update
moon add Kaida-Amethyst/python

接着，在你的包的 moon.pkg.json 中导入它：

{
  "import": ["Kaida-Amethyst/python"]
}

python.mbt 会自动处理 Python 解释器的初始化（Py_Initialize）和关闭，开发者无需手动管理。

导入 Python 模块

使用 @python.pyimport 函数来导入模块。为了避免重复导入造成的性能损耗，建议使用闭包技巧来缓存导入的模块对象：

// 定义一个结构体来持有 Python 模块对象，增强类型安全
pub struct TimeModule {
  ?
time_mod: PyModule
}

// 定义一个函数，它返回一个闭包，该闭包用于获取 TimeModule 实例
fn () -> () -> TimeModule
import_time_mod() -> () -> struct TimeModule {
  time_mod: ?
}
TimeModule {
  // 仅在首次调用时执行导入操作
  guard (String) -> Unit
@python.pyimport("time") is (?) -> Unit
Some(?
time_mod) else {
    (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Failed to load Python module: time")
    () -> () -> TimeModule
panic("ModuleLoadError")
  }
  let TimeModule
time_mod = struct TimeModule {
  time_mod: ?
}
TimeModule::{ ?
time_mod }
  // 返回的闭包会捕获 time_mod 变量
  fn () { TimeModule
time_mod }
}

// 创建一个全局的 time_mod "getter" 函数
let () -> TimeModule
time_mod: () -> struct TimeModule {
  time_mod: ?
}
TimeModule = () -> () -> TimeModule
import_time_mod()

在后续代码中，我们应始终通过调用 time_mod() 来获取模块，而不是 import_time_mod。

MoonBit 与 Python 对象的相互转换

要调用 Python 函数，我们需要在 MoonBit 对象和 Python 对象（PyObject）之间进行转换。

整数: 使用 PyInteger::from 从 Int64 创建 PyInteger，使用 to_int64() 反向转换。

test "py_integer_conversion" {
  let Int64
n: Int64
Int64 = 42
  let &Show
py_int = (Int64) -> &Show
PyInteger::from(Int64
n)
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(&Show
py_int, String
content="42")
  (a : Int64, b : Int64, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(&Show
py_int.() -> Int64
to_int64(), 42L)
}

浮点数: 使用 PyFloat::from 和 to_double。

test "py_float_conversion" {
  let Double
n: Double
Double = 3.5
  let &Show
py_float = (Double) -> &Show
PyFloat::from(Double
n)
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(&Show
py_float, String
content="3.5")
  (a : Double, b : Double, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(&Show
py_float.() -> Double
to_double(), 3.5)
}

字符串: 使用 PyString::from 和 to_string。

test "py_string_conversion" {
  let &Show
py_str = (String) -> &Show
PyString::from("hello")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(&Show
py_str, String
content="'hello'")
  (a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(&Show
py_str.(&Show) -> String
to_string(), "hello")
}

列表 (List) : 你可以创建一个空 PyList 然后 append 元素，或者直接从一个 Array[&IsPyObject] 创建。

test "py_list_from_array" {
  let Unit
one = (Int) -> Unit
PyInteger::from(1)
  let Unit
two = (Double) -> Unit
PyFloat::from(2.0)
  let Unit
three = (String) -> Unit
PyString::from("three")
  let Array[Unit]
arrArray[Unit]
: type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
ArrayArray[Unit]
[&IsPyObject] = [Unit
one, Unit
two, Unit
three]

  let &Show
list = (Array[Unit]) -> &Show
PyList::from(Array[Unit]
arr)
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(&Show
list, String
content="[1, 2.0, 'three']")
}

元组 (Tuple) : PyTuple 需要先指定大小，然后通过 set 方法逐一填充元素。

test "py_tuple_creation" {
  let &Show
tuple = (Int) -> &Show
PyTuple::new(3)
  &Show
tuple
  ..(Int, Unit) -> Unit
set(0, (Int) -> Unit
PyInteger::from(1))
  ..(Int, Unit) -> Unit
set(1, (Double) -> Unit
PyFloat::from(2.0))
  ..(Int, Unit) -> Unit
set(2, (String) -> Unit
PyString::from("three"))

  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(&Show
tuple, String
content="(1, 2.0, 'three')")
}

字典 (Dict) : PyDict 主要支持字符串作为键。使用 new 创建字典，set 添加键值对。对于非字符串键，需要使用 set_by_obj。

test "py_dict_creation" {
  let &Show
dict = () -> &Show
PyDict::new()
  &Show
dict
  ..(String, Unit) -> Unit
set("one", (Int) -> Unit
PyInteger::from(1))
  ..(String, Unit) -> Unit
set("two", (Double) -> Unit
PyFloat::from(2.0))

  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(&Show
dict, String
content="{'one': 1, 'two': 2.0}")
}

从 Python 复合类型中获取元素时，python.mbt 会进行运行时类型检查，并返回一个 Optional[PyObjectEnum]，以确保类型安全。

test "py_list_get" {
  let Unit
list = () -> Unit
PyList::new()
  Unit
list.(Unit) -> Unit
append((Int) -> Unit
PyInteger::from(1))
  Unit
list.(Unit) -> Unit
append((String) -> Unit
PyString::from("hello"))

  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Unit
list.(Int) -> Unit
get(0).() -> &Show
unwrap(), String
content="PyInteger(1)")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Unit
list.(Int) -> Unit
get(1).() -> &Show
unwrap(), String
content="PyString('hello')")
  (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:

object : The object to be inspected. Must implement the Show trait.
content : The expected string representation of the object. Defaults to
an empty string.
location : Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location : Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect(Unit
list.(Int) -> &Show
get(2), String
content="None") // 索引越界返回 None
}

调用模块中的函数

调用函数分为两步：首先用 get_attr 获取函数对象，然后用 invoke 执行调用。invoke 的返回值是一个需要进行模式匹配和类型转换的 PyObject。

下面是 time.sleep 和 time.time 的 MoonBit 封装：

// 封装 time.sleep
pub fn (seconds : Double) -> Unit
sleep(Double
seconds: Double
Double) -> Unit
Unit {
  let TimeModule
lib = () -> TimeModule
time_mod()
  guard TimeModule
lib.?
time_mod.(String) -> Unit
get_attr("sleep") is (_/0) -> Unit
Some((Unit) -> _/0
PyCallable(Unit
f)) else {
    (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("get function `sleep` failed!")
    () -> Unit
panic()
  }
  let Unit
args = (Int) -> Unit
PyTuple::new(1)
  Unit
args.(Int, Unit) -> Unit
set(0, (Double) -> Unit
PyFloat::from(Double
seconds))
  match (try? Unit
f.(Unit) -> Unit
invoke(Unit
args)) {
    (Unit) -> Result[Unit, Error]
Ok(_) => Unit
Ok(())
    (Error) -> Result[Unit, Error]
Err(Error
e) => {
      (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("invoke `sleep` failed!")
      () -> Unit
panic()
    }
  }
}

// 封装 time.time
pub fn () -> Double
time() -> Double
Double {
  let TimeModule
lib = () -> TimeModule
time_mod()
  guard TimeModule
lib.?
time_mod.(String) -> Unit
get_attr("time") is (_/0) -> Unit
Some((Unit) -> _/0
PyCallable(Unit
f)) else {
    (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("get function `time` failed!")
    () -> Double
panic()
  }
  match (try? Unit
f.() -> Unit
invoke()) {
    (Unit) -> Result[Unit, Error]
Ok((_/0) -> Unit
Some((Unit) -> _/0
PyFloat(Unit
t))) => Unit
t.() -> Double
to_double()
    _ => {
      (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("invoke `time` failed!")
      () -> Double
panic()
    }
  }
}

完成封装后，我们就可以在 MoonBit 中以类型安全的方式使用它们了：

test "sleep" {
  let Unit
start = () -> Double
time().() -> Unit
unwrap()
  (seconds : Double) -> Unit
sleep(1)
  let Unit
end = () -> Double
time().() -> Unit
unwrap()

  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("start = \{Unit
start}")
  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("end = \{Unit
end}")
}

实践建议

明确边界：将 python.mbt 视为连接 MoonBit 和 Python 生态的"胶水层"。将核心计算和业务逻辑保留在 MoonBit 中以利用其性能和类型系统优势，仅在必要情况下，需要调用 Python 独有库时才使用 python.mbt。
用 ADT 替代字符串魔法：许多 Python 函数接受特定的字符串作为参数来控制行为。在 MoonBit 封装中，应将这些"魔法字符串"转换为代数数据类型（ADT） ，即枚举。这利用了 MoonBit 的类型系统，将运行时的值检查提前到编译时，极大地增强了代码的健壮性。
完善的错误处理：本文中的示例为了简洁使用了 panic 或返回简单字符串。在生产代码中，应定义专门的错误类型，并通过 Result 类型进行传递和处理，提供清晰的错误上下文。

映射关键字参数：Python 函数广泛使用关键字参数（kwargs），如 plot(color='blue', linewidth=2)。这可以优雅地映射到 MoonBit 的标签参数（Labeled Arguments） 。在封装时，应优先使用标签参数以提供相似的开发体验。

例如，一个接受 kwargs 的 Python 函数：

# graphics.py
def draw_line(points, color="black", width=1):
    # ... drawing logic ...
    print(f"Drawing line with color {color} and width {width}")

其 MoonBit 封装可以设计成：

fn draw_line(points: Array[Point], color~: Color = Black, width: Int = 1) -> Unit {
  let points : PyList = ... // convert Array[Point] to PyList

  // 构造args
  let args = PyTuple::new(1)
  args .. set(0, points)

  // 构造kwargs
  let kwargs = PyDict::new()
  kwargs
  ..set("color", PyString::from(color))
  ...set("width", PyInteger::from(width))
  match (try? f.invoke(args~, kwargs~)) {
    Ok(_) => ()
    _ => {
      // 进行错误处理
    }
  }
}

警惕动态性：始终牢记 Python 是动态类型的。从 Python 获取的任何数据都应被视为"不可信"的，必须进行严格的类型检查和校验，尽量避免使用 unwrap，而是通过模式匹配来安全地处理所有可能的情况。

结语

本文梳理了 python.mbt 的工作原理，并展示了如何利用它在 MoonBit 中调用 Python 代码，无论是通过预封装的库还是直接与 Python 模块交互。python.mbt 不仅仅是一个工具，它代表了一种融合思想：将 MoonBit 的静态分析、高性能和工程化优势与 Python 庞大而成熟的生态系统相结合。我们希望这篇文章能为 MoonBit 和 Python 社区的开发者们在构建未来软件时，提供一个新的、更强大的选择。

MoonBit C-FFI 开发指南

2025年8月14日 · 阅读需 17 分钟

引言

MoonBit 是一门现代化函数式编程语言，它有着严谨的类型系统，高可读性的语法，以及专为AI设计的工具链等。然而，重复造轮子并不可取。无数经过时间检验、性能卓越的库是用C语言（或兼容C ABI的语言，如C++、Rust）编写的。从底层硬件操作到复杂的科学计算，再到图形渲染，C的生态系统是一座蕴藏着无尽宝藏的富矿。

那么，我们能否让现代的MoonBit与这些经典的C库协同工作，让新世界的开拓者也能使用旧时代的强大工具呢？答案是肯定的。通过C语言外部函数接口（C Foreign Function Interface, C-FFI），MoonBit拥有调用C函数的能力，将新旧两个世界连接起来。

这篇文章将作为你的向导，带你一步步探索MoonBit C-FFI的奥秘。我们将通过一个具体的例子——为一个C语言编写的数学库 mymath 创建MoonBit绑定——来学习如何处理不同类型的数据、指针、结构体乃至函数指针。

预先准备

要连接到任何一个C库，我们需要知道这个C库的头文件的函数，如何找到头文件，如何找到库文件。对于我们这篇文章的任务来说。C语言数学库的头文件就是 mymath.h。它定义了我们希望在MoonBit中调用的各种函数和类型。我们这里假设我们的mymath是安装到系统上的，编译时使用-I/usr/inluclude来找到头文件，使用-L/usr/lib -lmymath来链接库，下面是我们的mymath.h的部分内容。

// mymath.h

// --- 基础函数 ---
void print_version();
int version_major();
int is_normal(double input);

// --- 浮点数计算 ---
float sinf(float input);
float cosf(float input);
float tanf(float input);
double sin(double input);
double cos(double input);
double tan(double input);

// --- 字符串与指针 ---
int parse_int(char* str);
char* version();
int tan_with_errcode(double input, double* output);

// --- 数组操作 ---
int sin_array(int input_len, double* inputs, double* outputs);
int cos_array(int input_len, double* inputs, double* outputs);
int tan_array(int input_len, double* inputs, double* outputs);

// --- 结构体与复杂类型 ---
typedef struct {
  double real;
  double img;
} Complex;

Complex* new_complex(double r, double i);
void multiply(Complex* a, Complex* b, Complex** result);
void init_n_complexes(int n, Complex** complex_array);

// --- 函数指针 ---
void for_each_complex(int n, Complex** arr, void (*call_back)(Complex*));

基础准备 (The Groundwork)

在编写任何 FFI 代码之前，我们需要先搭建好 MoonBit 与 C 代码之间的桥梁。

编译到 Native

首先，MoonBit 代码需要被编译成原生机器码。这可以通过以下命令完成：

moon build --target native

这个命令会将你的 MoonBit 项目编译成 C 代码，并使用系统上的 C 编译器（如 GCC 或 Clang）将其编译为最终的可执行文件。编译后的 C 文件位于 target/native/release/build/ 目录下，按包名存放在相应的子目录中。例如，main/main.mbt 会被编译到 target/native/release/build/main/main.c。

配置链接

仅仅编译是不够的，我们还需要告诉 MoonBit 编译器如何找到并链接到我们的 mymath 库。这需要在项目的 moon.pkg.json 文件中进行配置。

{
  "supported-targets": ["native"],
  "link": {
    "native": {
      "cc": "clang",
      "cc-flags": "-I/usr/include",
      "cc-link-flags": "-L/usr/lib -lmymath"
    }
  }
}

cc: 指定用于编译C代码的编译器，例如 clang 或 gcc。
cc-flags: 编译C文件时需要的标志，通常用来指定头文件搜索路径（-I）。
cc-link-flags: 链接时需要的标志，通常用来指定库文件搜索路径（-L）和具体要链接的库（-l）。

同时，我们还需要一个 "胶水" C 文件，我们这里命名为 cwrap.c，用来包含 C 库的头文件和 MoonBit 的运行时头文件。

// cwrap.c
#include <mymath.h>
#include <moonbit.h>

这个胶水文件也需要通过 moon.pkg.json 告知 MoonBit 编译器：

{
  // ... 其他配置
  "native-stub": ["cwrap.c"]
}

完成这些配置后，我们的项目就已经准备好与 mymath 库进行链接了。

第一次跨语言调用 (The First FFI Call)

万事俱备，让我们来进行第一次真正的跨语言调用。在 MoonBit 中声明一个外部 C 函数，语法如下：

extern "C" fn moonbit_function_name(arg: Type) -> ReturnType = "c_function_name"

extern "C"：告诉 MoonBit 编译器，这是一个外部 C 函数。
moonbit_function_name：在 MoonBit 代码中使用的函数名。
"c_function_name"：实际链接到的 C 函数的名称。

让我们用 mymath.h 中最简单的 version_major 函数来小试牛刀：

extern "C" fn version_major() -> Int
Int = "version_major"

注意：MoonBit 拥有强大的死代码消除（DCE）能力。如果你只是声明了上面的 FFI 函数但从未在代码中（例如 main 函数）实际调用它，编译器会认为它是无用代码，并不会在最终生成的 C 代码中包含它的声明。所以，请确保你至少在一个地方调用了它！

跨越类型系统的鸿沟 (Navigating the Type System Chasm)

真正的挑战在于处理两种语言之间的数据类型差异，对于一些复杂的类型情况，需要读者有一定的C语言知识。

3.1 基本类型：(Basic Types)

对于基础的数值类型，MoonBit 和 C 之间有直接且清晰的对应关系。

MoonBit Type	C Type	Notes
`Int`	`int32_t`
`Int64`	`int64_t`
`UInt`	`uint32_t`
`UInt64`	`uint64_t`
`Float`	`float`
`Double`	`double`
`Bool`	`int32_t`	C语言标准没有原生 `bool`，通常用 `int32_t` (0/1) 表示
`Unit`	`void` (返回值)	用于表示 C 函数没有返回值的情况
`Byte`	`uint8_t`

根据这个表格，我们可以轻松地为 mymath.h 中的大部分简单函数编写 FFI 声明：

extern "C" fn print_version() -> Unit
Unit = "print_version"
extern "C" fn version_major() -> Int
Int = "version_major"

// 返回值语义上是布尔值，使用 MoonBit 的 Bool 类型更清晰
extern "C" fn is_normal(input: Double
Double) -> Bool
Bool = "is_normal"

extern "C" fn sinf(input: Float
Float) -> Float
Float = "sinf"
extern "C" fn cosf(input: Float
Float) -> Float
Float = "cosf"
extern "C" fn tanf(input: Float
Float) -> Float
Float = "tanf"

extern "C" fn sin(input: Double
Double) -> Double
Double = "sin"
extern "C" fn cos(input: Double
Double) -> Double
Double = "cos"
extern "C" fn tan(input: Double
Double) -> Double
Double = "tan"

3.2 字符串 (Strings)

事情在遇到字符串时开始变得有趣。你可能会想当然地把 C 的 char* 映射到 MoonBit 的 String，但这是一个常见的陷阱。

MoonBit 的 String 和 C 的 char* 在内存布局上完全不同。char* 是一个指向以 \0 结尾的字节序列的指针，而 MoonBit 的 String 是一个由 GC 管理的、包含长度信息和 UTF-16 编码数据的复杂对象。

参数传递：从 MoonBit 到 C

当我们需要将一个 MoonBit 字符串传递给一个接受 char* 的 C 函数时（如 parse_int），我们需要手动进行转换。一个推荐的做法是将其转换为 Bytes 类型。

// 一个辅助函数，将 MoonBit String 转换为 C 期望的 null-terminated byte array
fn (s : String) -> Bytes
string_to_c_bytes(String
s: String
String) -> Bytes
Bytes {
  let mut Array[Byte]
arr = String
s.(self : String) -> Bytes
String holds a sequence of UTF-16 code units encoded in little endian format
to_bytes().(self : Bytes) -> Array[Byte]
Converts a bytes sequence into an array of bytes.
Parameters:

bytes : A sequence of bytes to be converted into an array.
Returns an array containing the same bytes as the input sequence.
Example:
  let bytes = b"hello"
  let arr = bytes.to_array()
  inspect(arr, content="[b'\\x68', b'\\x65', b'\\x6C', b'\\x6C', b'\\x6F']")
to_array()
  // 确保以 \0 结尾
  if Array[Byte]
arr.(self : Array[Byte]) -> Byte?
Returns the last element of the array, or None if the array is empty.
Parameters:

array : The array to get the last element from.
Returns an optional value containing the last element of the array. The
result is None if the array is empty, or Some(x) where x is the last
element of the array.
Example:
  let arr = [1, 2, 3]
  inspect(arr.last(), content="Some(3)")
  let empty : Array[Int] = []
  inspect(empty.last(), content="None")
last() (x : Byte?, y : Byte?) -> Bool
!= (Byte) -> Byte?
Some(0) {
    Array[Byte]
arr.(self : Array[Byte], value : Byte) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(0)
  }
  (Array[Byte]) -> Bytes
Bytes::(arr : Array[Byte]) -> Bytes
Creates a new bytes sequence from a byte array.
Parameters:

array : An array of bytes to be converted.
Returns a new bytes sequence containing the same bytes as the input array.
Example:
  let arr = [b'h', b'i']
  let bytes = @bytes.from_array(arr)
  inspect(
    bytes, 
    content=(
      
  #|b"hi"

    ),
  )
from_array(Array[Byte]
arr)
}

// FFI 声明，注意参数类型是 Bytes
#borrow(s) // 告诉编译器我们只是借用 s，不要增加其引用计数
extern "C" fn __parse_int(s: Bytes
Bytes) -> Int
Int = "parse_int"

// 封装成一个对用户友好的 MoonBit 函数
fn (str : String) -> Int
parse_int(String
str: String
String) -> Int
Int {
  let Bytes
s = (s : String) -> Bytes
string_to_c_bytes(String
str)
  (s : Bytes) -> Int
__parse_int(Bytes
s)
}

#borrow 标记 borrow 标记是一个优化提示。它告诉编译器，C函数只是"借用"这个参数，不会持有它的所有权。这可以避免不必要的引用计数操作，防止潜在的内存泄漏。

返回值：从 C 到 MoonBit

反过来，当 C 函数返回一个 char* 时（如 version），情况更加复杂。我们绝对不能直接将其声明为返回 Bytes 或 String：

// 错误的做法！
extern "C" fn version() -> Bytes
Bytes = "version"

这是因为 C 函数返回的只是一个裸指针，它缺少 MoonBit GC 所需的头部信息。直接这样转换会导致运行时崩溃。

正确的做法是，将返回的 char* 视为一个不透明的句柄，然后在 C "胶水" 代码中编写一个转换函数，手动将其转换为一个合法的 MoonBit 字符串。

MoonBit 侧：

// 1. 声明一个外部类型来代表 C 字符串指针
#extern
type CStr

// 2. 声明一个 FFI 函数，它调用 C 包装器
extern "C" fn type CStr
CStr::to_string(self: type CStr
Self) -> String
String = "cstr_to_moonbit_str"

// 3. 声明原始的 C 函数，它返回我们的不透明类型
extern "C" fn __version() -> type CStr
CStr = "version"

// 4. 封装成一个安全的 MoonBit 函数
fn () -> String
version() -> String
String {
  () -> CStr
__version().(self : CStr) -> String
to_string()
}

C 侧 (在 cwrap.c 中添加):

#include <string.h> // for strlen

// 这个函数负责将 char* 正确地转换为带 GC 头的 moonbit_string_t
moonbit_string_t cstr_to_moonbit_str(char *ptr) {
  if (ptr == NULL) {
    return moonbit_make_string(0, 0);
  }
  int32_t len = strlen(ptr);
  // moonbit_make_string 会分配一个带 GC 头的 MoonBit 字符串对象
  moonbit_string_t ms = moonbit_make_string(len, 0);
  for (int i = 0; i < len; i++) {
    ms[i] = (uint16_t)ptr[i]; // 假设是 ASCII 兼容的
  }
  // 注意：是否需要 free(ptr) 取决于 C 库的 API 约定。
  // 如果 version() 返回的内存需要调用者释放，这里就需要 free。
  return ms;
}

这个模式虽然初看有些繁琐，但它保证了内存安全，是处理 C 字符串返回值的标准做法。

3.3 指针的艺术：传递引用与数组 (The Art of Pointers: Passing by Reference and Arrays)

C 语言大量使用指针来实现"输出参数"和传递数组。MoonBit 为此提供了专门的类型。

单个值的"输出"参数

当 C 函数使用指针来返回一个额外的值时，如 tan_with_errcode(double input, double* output)，MoonBit 使用 Ref[T] 类型来对应。

extern "C" fn tan_with_errcode(input: Double
Double, output: struct Ref[A] {
  mut val: A
}
Ref[Double
Double]) -> Int
Int = "tan_with_errcode"

Ref[T] 在 MoonBit 中是一个包含单个 T 类型字段的结构体。当它传递给 C 时，MoonBit 会传递这个结构体的地址。从 C 的角度看，一个指向 struct { T val; } 的指针和一个指向 T 的指针在内存地址上是等价的，因此可以直接工作。

数组：传递数据集合

当 C 函数需要处理一个数组时（例如 double* inputs），MoonBit 使用 FixedArray[T] 类型来映射。FixedArray[T] 在内存中就是一块连续的 T 类型元素，其指针可以直接传递给 C。

extern "C" fn sin_array(len: Int
Int, inputs: type FixedArray[A]
FixedArray[Double
Double], outputs: type FixedArray[A]
FixedArray[Double
Double]) -> Int
Int = "sin_array"
extern "C" fn cos_array(len: Int
Int, inputs: type FixedArray[A]
FixedArray[Double
Double], outputs: type FixedArray[A]
FixedArray[Double
Double]) -> Int
Int = "cos_array"
extern "C" fn tan_array(len: Int
Int, inputs: type FixedArray[A]
FixedArray[Double
Double], outputs: type FixedArray[A]
FixedArray[Double
Double]) -> Int
Int = "tan_array"

3.4 外部类型：拥抱不透明的 C 结构体 (External Types: Embracing Opaque C Structs)

对于 C 中的 struct，比如 Complex，最佳实践通常是将其视为一个"不透明类型"（Opaque Type）。我们只在 MoonBit 中创建一个对它的引用（或句柄），而不关心其内部的具体字段。

这通过 #extern type 语法实现：

#extern
type Complex

这个声明告诉 MoonBit："存在一个名为 Complex 的外部类型。你不需要知道它的内部结构，只要把它当成一个指针大小的句柄来传递就行了。" 在生成的 C 代码中，Complex 类型会被处理成 void*。这通常是安全的，因为所有对 Complex 的操作都是在 C 库内部完成的，MoonBit 侧只负责传递指针。

基于这个原则，我们可以为 mymath.h 中与 Complex 相关的函数编写 FFI：

// C: Complex* new_complex(double r, double i);
// 返回一个指向 Complex 的指针，在 MoonBit 中就是返回一个 Complex 句柄
extern "C" fn new_complex(r: Double
Double, i: Double
Double) -> type Complex
Complex = "new_complex"

// C: void multiply(Complex* a, Complex* b, Complex** result);
// Complex* 对应 Complex，而 Complex** 对应 Ref[Complex]
extern "C" fn multiply(a: type Complex
Complex, b: type Complex
Complex, res: struct Ref[A] {
  mut val: A
}
Ref[type Complex
Complex]) -> Unit
Unit = "multiply"

// C: void init_n_complexes(int n, Complex** complex_array);
// Complex** 在这里作为数组使用，对应 FixedArray[Complex]
extern "C" fn init_n_complexes(n: Int
Int, complex_array: type FixedArray[A]
FixedArray[type Complex
Complex]) -> Unit
Unit = "init_n_complexes"

最佳实践：封装原生 FFI 直接暴露 FFI 函数会让使用者感到困惑（比如 Ref 和 FixedArray）。强烈建议在 FFI 声明之上再构建一层对 MoonBit 用户更友好的 API。

// 在 Complex 类型上定义方法，隐藏 FFI 细节
fn type Complex
Complex::(self : Complex, other : Complex) -> Complex
mul(Complex
self: type Complex
Complex, Complex
other: type Complex
Complex) -> type Complex
Complex {
  // 创建一个临时的 Ref 用于接收结果
  let Ref[Complex]
res: struct Ref[A] {
  mut val: A
}
Ref[type Complex
Complex] = struct Ref[A] {
  mut val: A
}
Ref::{ Complex
val: (r : Double, i : Double) -> Complex
new_complex(0, 0) }
  (a : Complex, b : Complex, res : Ref[Complex]) -> Unit
multiply(Complex
self, Complex
other, Ref[Complex]
res)
  Ref[Complex]
res.Complex
val // 返回结果
}

fn (n : Int) -> Array[Complex]
init_n(Int
n: Int
Int) -> type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[type Complex
Complex] {
  // 使用 FixedArray::make 创建数组
  let FixedArray[Complex]
arr = type FixedArray[A]
FixedArray::(len : Int, init : Complex) -> FixedArray[Complex]
Creates a new fixed-size array with the specified length, initializing all
elements with the given value.
Parameters:

length : The length of the array to create. Must be non-negative.
initial_value : The value used to initialize all elements in the array.
Returns a new fixed-size array of type FixedArray[T] with length
elements, where each element is initialized to initial_value.
Throws a panic if length is negative.
Example:
  let arr = FixedArray::make(3, 42)
  inspect(arr[0], content="42")
  inspect(arr.length(), content="3")
WARNING: A common pitfall is creating with the same initial value, for example:
  let two_dimension_array = FixedArray::make(10, FixedArray::make(10, 0))
  two_dimension_array[0][5] = 10
  assert_eq(two_dimension_array[5][5], 10)
This is because all the cells reference to the same object (the FixedArray[Int] in this case).
One should use makei() instead which creates an object for each index.
make(Int
n, (r : Double, i : Double) -> Complex
new_complex(0, 0))
  (n : Int, complex_array : FixedArray[Complex]) -> Unit
init_n_complexes(Int
n, FixedArray[Complex]
arr)
  // 将 FixedArray 转换为对用户更友好的 Array
  type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array::(FixedArray[Complex]) -> Array[Complex]
Creates a new dynamic array from a fixed-size array.
Parameters:

arr : The fixed-size array to convert. The elements of this array will be
copied to the new array.
Returns a new dynamic array containing all elements from the input fixed-size
array.
Example:
  let fixed = FixedArray::make(3, 42)
  let dynamic = Array::from_fixed_array(fixed)
  inspect(dynamic, content="[42, 42, 42]")
from_fixed_array(FixedArray[Complex]
arr)
}

3.5 函数指针：当 C 需要回调 MoonBit (Function Pointers: When C Needs to Call Back)

mymath.h 中最复杂的函数是 for_each_complex，它接受一个函数指针作为参数。

void for_each_complex(int n, Complex** arr, void (*call_back)(Complex*));

一个常见的误解是试图将 MoonBit 的闭包类型 (Complex) -> Unit 直接映射到 C 的函数指针。这是不行的，因为 MoonBit 的闭包在底层是一个包含两部分的结构体：一个指向实际函数代码的指针，以及一个指向其捕获的环境数据的指针。

为了传递一个纯粹的、无环境捕获的函数指针，MoonBit 提供了 FuncRef 类型：

extern "C" fn for_each_complex(
  n: Int
Int,
  arr: type FixedArray[A]
FixedArray[type Complex
Complex],
  call_back: FuncRef[(type Complex
Complex) -> Unit
Unit] // 使用 FuncRef 包装函数类型
) -> Unit
Unit = "for_each_complex"

任何被 FuncRef 包裹的函数类型，在传递给 C 时，都会被转换成一个标准的 C 函数指针。

如何声明一个FuncRef？只要使用let就可以了，只要函数没有捕获外部变量，就可以声明成功。

fn (c : Complex) -> Unit
print_complex(Complex
c: type Complex
Complex) -> Unit
Unit { ... }

fn main {
  let FuncRef[(Complex) -> Unit]
print_complexFuncRef[(Complex) -> Unit]
 : FuncRef[(type Complex
ComplexFuncRef[(Complex) -> Unit]
) -> Unit
UnitFuncRef[(Complex) -> Unit]
] = (Complex
c) => (c : Complex) -> Unit
print_complex(Complex
c)
  // ...
}

第四站：高级课题——GC管理(Advanced Topic: GC Management)

我们已经了解了大部分类型的转换问题，但还有一个非常重大的问题：内存管理。C 依赖手动的 malloc/free，而 MoonBit 拥有自动的垃圾回收（GC）。当 C 库创建了一个对象（如 new_complex），谁来负责释放它？

可以不要GC吗？

一些库作者可能会选择不做GC，而是把所有的析构操作都留给用户。这种做法在一些库上有其合理性，因为有些库，例如一些高性能计算库，图形库等，为了提高性能或者稳定性，本身就会放弃掉一些GC特性，但带来的问题就是对程序员的水平要求较高。大多数库还是需要提供GC来增强用户体验的。

理想情况下，我们希望 MoonBit 的 GC 能够自动管理这些 C 对象的生命周期。MoonBit 提供了两种机制来实现这一点。

4.1 简单情况

如果 C 结构体非常简单，并且你确信它的内存布局在所有平台上都是稳定不变的，你可以直接在 MoonBit 中重新定义它。

// mymath.h: typedef struct { double real; double img; } Complex;
// MoonBit:
struct Complex {
  r: Double,
  i: Double
}

这样做，Complex 就成了一个真正的 MoonBit 对象。MoonBit 编译器会自动为它管理内存，添加 GC 头。当你把它传递给 C 函数时，MoonBit 会传递一个指向其数据部分的指针，这通常是可行的。

但这种方法有很大的局限性：

它要求你精确知道 C 结构体的内存布局、对齐方式等，这可能很脆弱。
如果 C 函数返回一个 Complex*，你不能直接使用它。你必须像处理字符串返回值一样，编写一个 C 包装函数，将 C 结构体的数据复制到一个新创建的、带 GC 头的 MoonBit Complex 对象中。

因此，这种方法只适用于最简单的情况。对于大多数场景，我们推荐更健壮的析构方案。

4.2 复杂情况，使用析构函数（Finalizer） (The Complex Situation: Using Finalizers)

这是一种更通用和安全的方法。核心思想是：创建一个 MoonBit 对象来"包装"C 指针，并告诉 MoonBit 的 GC，当这个包装对象被回收时，应该调用一个特定的 C 函数（析构函数）来释放底层的 C 指针。

这个过程分为几步：

1. 在 MoonBit 中声明两种类型

#extern
type C_Complex // 代表原始的、不透明的 C 指针

type Complex C_Complex // 一个 MoonBit 类型，它内部包装了一个 C_Complex

type Complex C_Complex 是一个特殊的声明，它创建了一个名为 Complex 的 MoonBit 对象类型，其内部有一个字段，类型为 C_Complex。我们可以通过 .inner() 方法访问到这个内部字段。

2. 在 C 中提供析构函数和包装函数

我们需要一个 C 函数来释放 Complex 对象，以及一个函数来创建我们带 GC 功能的 MoonBit 包装对象。

C 侧 (在 cwrap.c 中添加):

// mymath 库应该提供一个释放 Complex 的函数，假设是 free_complex
// void free_complex(Complex* c);

// 我们需要一个 void* 版本的析构函数给 MoonBit GC 使用
void free_complex_finalizer(void* obj) {
    // MoonBit 外部对象的布局是 { void (*finalizer)(void*); T data; }
    // 我们需要从 obj 中提取出真正的 Complex 指针
    // 假设 MoonBit 的 Complex 包装器只有一个字段
    Complex* c_obj = *((Complex**)obj);
    free_complex(c_obj); // 调用真正的析构函数, 如果mymath库提供的话
    // free(c_obj); // 如果是标准的 malloc 分配的
}

// 定义 MoonBit 的 Complex 包装器在 C 中的样子
typedef struct {
  Complex* val;
} MoonBit_Complex;

// 创建 MoonBit 包装对象的函数
MoonBit_Complex* new_mbt_complex(Complex* c_complex) {
  // `moonbit_make_external_obj` 是关键
  // 它创建一个由 GC 管理的外部对象，并注册其析构函数。
  MoonBit_Complex* mbt_complex = moonbit_make_external_obj(
      &free_complex_finalizer,
      sizeof(MoonBit_Complex)
  );
  mbt_complex->val = c_complex;
  return mbt_complex;
}

3. 在 MoonBit 中使用包装函数

现在，我们不直接调用 new_complex，而是调用我们的包装函数 new_mbt_complex。

// FFI 声明指向我们的 C 包装函数
extern "C" fn __new_managed_complex(c_complex: type C_Complex
C_Complex) -> type Complex
Complex = "new_mbt_complex"

// 原始的 C new_complex 函数返回一个裸指针
extern "C" fn __new_unmanaged_complex(r: Double
Double, i: Double
Double) -> type C_Complex
C_Complex = "new_complex"

// 最终提供给用户的、安全的、GC 友好的 new 函数
fn type Complex
Complex::(r : Double, i : Double) -> Complex
new(Double
r: Double
Double, Double
i: Double
Double) -> type Complex
Complex {
  let C_Complex
c_ptr = (r : Double, i : Double) -> C_Complex
__new_unmanaged_complex(Double
r, Double
i)
  (c_complex : C_Complex) -> Complex
__new_managed_complex(C_Complex
c_ptr)
}

现在，当 Complex::new 创建的对象在 MoonBit 中不再被使用时，GC 会自动调用 free_complex_finalizer，从而安全地释放了 C 库分配的内存。

当需要将我们管理的 Complex 对象传递给其他 C 函数时，只需使用 .inner() 方法：

// 假设有一个C函数 `double length(Complex*);`
extern "C" fn length(c_complex: type C_Complex
C_Complex) -> Double
Double = "length"

fn type Complex
Complex::(self : Complex) -> Double
length(Complex
self: type Complex
Self) -> Double
Double {
  // self.inner() 返回内部的 C_Complex (即 C 指针)
  (c_complex : C_Complex) -> Double
length(Complex
self.() -> C_Complex
inner())
}

结语 (Conclusion)

这篇文章带你从基本类型，到复杂的结构体类型，再到函数指针类型，梳理了在MoonBit中做C-FFI的流程。末尾讨论了MoonBit管理c对象的GC问题。希望对广大读者的库开发有帮助。

Moonbit 与 llvm 共舞下篇 - llvm后端生成

2025年8月6日 · 阅读需 18 分钟

引言

在编程语言设计的过程中，语法前端负责理解和验证程序的结构与语义，而编译器后端则承担着将这些抽象概念转化为可执行机器代码的重任。后端的实现不仅需要对目标体系结构有深入的理解，更要掌握复杂的优化技术来生成高效的代码。

LLVM（Low Level Virtual Machine）作为现代编译器基础设施的集大成者，为我们提供了一个强大而灵活的解决方案。通过将程序转换为LLVM中间表示（Intermediate Representation, IR），我们可以利用LLVM成熟的工具链将代码编译到多种目标架构，包括RISC-V、ARM和x86等。

Moonbit的LLVM生态

Moonbit官方提供了两个重要的LLVM相关项目：

**llvm.mbt**：原版LLVM的Moonbit语言绑定，提供对llvm-c接口的直接访问。需要安装完整的LLVM工具链，只能生成native后端，需要自行解决编译和链接的问题，但能够生成与原版LLVM完全兼容的IR。

**MoonLLVM**：纯Moonbit实现的LLVM仿制版，无需外部依赖即可生成LLVM IR，支持JavaScript和WebAssembly后端

本文选择llvm.mbt作为我们的工具，其API设计参考了Rust生态中广受好评的inkwell库。

在上篇《Moonbit 与 LLVM 共舞：实现现代编译器（上篇）》中，我们已经完成了从源代码到类型化抽象语法树的转换。本篇将承接这一成果，重点阐述代码生成的核心技术和实现细节。

第一章：LLVM类型系统的Moonbit表示

在深入代码生成之前，我们需要首先理解llvm.mbt如何在Moonbit的类型系统中表示LLVM的各种概念。LLVM的类型系统相当复杂，包含基本类型、复合类型和函数类型等多个层次。

Trait Object：类型的抽象表示

在llvm.mbt的API设计中，你会频繁遇到&Type这一核心概念。这并非一个具体的struct或enum，而是一个Trait Object——可以将其理解为面向对象编程中抽象基类的函数式对等物。

// &Type是一个trait object，代表任意LLVM类型
let Unit
some_type: &Type = Unit
context.() -> Unit
i32_type()

类型识别与转换

要确定一个&Type的具体类型，我们需要通过as_type_enum接口进行运行时类型检查：

pub fn (ty : Unit) -> String
identify_type(Unit
ty: &Type) -> String
String {
  match Unit
ty.() -> Unit
as_type_enum() {
    (Unit) -> Unit
IntType(Unit
int_ty) => "Integer type with \{Unit
int_ty.() -> Unit
get_bit_width()} bits"
    (_/0) -> Unit
FloatType(_/0
float_ty) => "Floating point type"
    (_/0) -> Unit
PointerType(_/0
ptr_ty) => "Pointer type"
    (_/0) -> Unit
FunctionType(_/0
func_ty) => "Function type"
    (_/0) -> Unit
ArrayType(_/0
array_ty) => "Array type"
    (_/0) -> Unit
StructType(_/0
struct_ty) => "Structure type"
    (_/0) -> Unit
VectorType(_/0
vec_ty) => "Vector type"
    (_/0) -> Unit
ScalableVectorType(_/0
svec_ty) => "Scalable vector type"
    (_/0) -> Unit
MetadataType(_/0
meta_ty) => "Metadata type"
  }
}

安全的类型转换策略

当我们确信某个&Type具有特定的类型时，有多种转换方式可供选择：

直接转换（适用于确定性场景）

let Unit
ty: &Type = Unit
context.() -> Unit
i32_type()
let ?
i32_ty = Unit
ty.() -> ?
into_int_type()  // 直接转换，错误由llvm.mbt处理
let ?
bit_width = ?
i32_ty.() -> ?
get_bit_width()  // 调用IntType特有的方法

防御性转换（推荐的生产环境做法）

let Unit
ty: &Type = () -> Unit
get_some_type()  // 从某处获得的未知类型

guard ty.as_type_enum() is IntType(i32_ty) else {
  raise CodeGenError("Expected integer type, got \{ty}")
}

// 现在可以安全地使用i32_ty
let ?
bit_width = ?
i32_ty.() -> ?
get_bit_width()

复合类型的构造

LLVM支持多种复合类型，这些类型通常通过基本类型的方法来构造：

pub fn (context : ?) -> Unit
create_composite_types(?
context: @llvm.Context) -> Unit
Unit {
  let Unit
i32_ty = ?
context.() -> Unit
i32_type()
  let Unit
f64_ty = ?
context.() -> Unit
f64_type()

  // 数组类型：[16 x i32]
  let Unit
i32_array_ty = Unit
i32_ty.(Int) -> Unit
array_type(16)

  // 函数类型：i32 (i32, i32)
  let Unit
add_func_ty = Unit
i32_ty.(Array[Unit]) -> Unit
fn_type([Unit
i32_ty, Unit
i32_ty])

  // 结构体类型：{i32, f64}
  let Unit
struct_ty = ?
context.(Array[Unit]) -> Unit
struct_type([Unit
i32_ty, Unit
f64_ty])

  // 指针类型（LLVM 18+中所有指针都是opaque）
  let Unit
ptr_ty = Unit
i32_ty.() -> Unit
ptr_type()

  // 输出类型信息用于验证
  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Array type: \{Unit
i32_array_ty}")      // [16 x i32]
  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Function type: \{Unit
add_func_ty}")    // i32 (i32, i32)
  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Struct type: \{Unit
struct_ty}")        // {i32, f64}
  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Pointer type: \{Unit
ptr_ty}")          // ptr
}

重要提醒：Opaque指针

自LLVM 18版本开始，所有指针类型都采用了opaque指针设计。这意味着无论指向什么类型，所有指针在IR中都表示为ptr，指向的具体类型信息在类型系统中不再可见。

第二章：LLVM值系统与BasicValue概念

相比类型系统，LLVM的值系统会复杂一些。llvm.mbt与inkwell一致，将值分为两个重要的抽象层次。Value 和 BasicValue。不同点在于在于区分值的创建来源和值的使用方式：

Value：关注值是如何产生的（常量、指令结果等）
BasicValue：关注值具有什么样的基本类型（整数、浮点数、指针等）

实际应用示例

pub fn (context : ?, builder : ?) -> Unit
demonstrate_value_system(?
context: Context, ?
builder: Builder) -> Unit
Unit {
  let Unit
i32_ty = ?
context.() -> Unit
i32_type()

  // 创建两个整数常量 - 这些直接就是IntValue
  let Unit
const1 = Unit
i32_ty.(Int) -> Unit
const_int(10)  // Value: IntValue, BasicValue: IntValue
  let Unit
const2 = Unit
i32_ty.(Int) -> Unit
const_int(20)  // Value: IntValue, BasicValue: IntValue

  // 执行加法运算 - 结果是一个指令InstructionValue
  let Unit
add_result = ?
builder.(Unit, Unit) -> Unit
build_int_add(Unit
const1, Unit
const2)

  // 在不同的上下文中，我们需要不同的视角：

  // 作为指令来检查其属性
  let Unit
instruction = Unit
add_result.() -> Unit
as_instruction()
  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Instruction opcode: \{Unit
instruction.() -> Unit
get_opcode()}")

  // 作为基本值来获取其类型
  let Unit
basic_value = Unit
add_result.() -> Unit
into_basic_value()
  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Result type: \{Unit
basic_value.() -> Unit
get_type()}")

  // 作为整数值来进行后续计算
  let Unit
int_value = Unit
add_result.() -> Unit
into_int_value()
  let Unit
final_result = ?
builder.(Unit, Unit) -> Unit
build_int_mul(Unit
int_value, Unit
const1)
}

值类型的完整分类

ValueEnum：所有可能的值类型

pub enum ValueEnum {
  (?) -> ValueEnum
IntValue(IntValue)              // 整数值
  (?) -> ValueEnum
FloatValue(FloatValue)          // 浮点数值
  (?) -> ValueEnum
PointerValue(PointerValue)      // 指针值
  (?) -> ValueEnum
StructValue(StructValue)        // 结构体值
  (?) -> ValueEnum
FunctionValue(FunctionValue)    // 函数值
  (?) -> ValueEnum
ArrayValue(ArrayValue)          // 数组值
  (?) -> ValueEnum
VectorValue(VectorValue)        // 向量值
  (?) -> ValueEnum
PhiValue(PhiValue)             // Phi节点值
  (?) -> ValueEnum
ScalableVectorValue(ScalableVectorValue)  // 可伸缩向量值
  (?) -> ValueEnum
MetadataValue(MetadataValue)    // 元数据值
  (?) -> ValueEnum
CallSiteValue(CallSiteValue)    // 调用点值
  (?) -> ValueEnum
GlobalValue(GlobalValue)        // 全局值
  (?) -> ValueEnum
InstructionValue(InstructionValue)  // 指令值
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show)

BasicValueEnum：具有基本类型的值

pub enum BasicValueEnum {
  (?) -> BasicValueEnum
ArrayValue(ArrayValue)              // 数组值
  (?) -> BasicValueEnum
IntValue(IntValue)                  // 整数值
  (?) -> BasicValueEnum
FloatValue(FloatValue)              // 浮点数值
  (?) -> BasicValueEnum
PointerValue(PointerValue)          // 指针值
  (?) -> BasicValueEnum
StructValue(StructValue)            // 结构体值
  (?) -> BasicValueEnum
VectorValue(VectorValue)            // 向量值
  (?) -> BasicValueEnum
ScalableVectorValue(ScalableVectorValue)  // 可伸缩向量值
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show)

💡 值转换的最佳实践

在实际的代码生成过程中，我们经常需要在不同的值视角之间进行转换：

pub fn (instruction_result : Unit) -> Unit
value_conversion_patterns(Unit
instruction_result: &Value) -> Unit
Unit {
  // 模式1：我知道这是什么类型，直接转换
  let Unit
int_val = Unit
instruction_result.() -> Unit
into_int_value()

  // 模式2：我只需要一个基本值，不关心具体类型
  let Unit
basic_val = Unit
instruction_result.() -> Unit
into_basic_value()

  // 模式3：防御性编程，检查后转换
  match Unit
instruction_result.() -> Unit
as_value_enum() {
    // 处理整数值
    (Unit) -> Unit
IntValue(Unit
int_val) => (Unit) -> Unit
handle_integer(Unit
int_val)
    // 处理浮点值
    (Unit) -> Unit
FloatValue(Unit
float_val) => (Unit) -> Unit
handle_float(Unit
float_val)
    _ => raise Error
CodeGenError("Unexpected value type")
  }
}

通过这种双层抽象，llvm.mbt既保持了LLVM值系统的完整性，又为Moonbit开发者提供了直观易用的接口。

第三章：LLVM IR生成实战

在理解了类型和值系统的基础上，让我们通过一个完整的示例来演示如何使用llvm.mbt生成LLVM IR。这个示例将实现一个简单的 muladd 函数，展示从初始化到指令生成的完整流程。

基础设施初始化

任何LLVM程序的开始都需要建立三个核心组件：

pub fn () -> (?, ?, ?)
initialize_llvm() -> (Context, Module, Builder) {
  // 1. 创建LLVM上下文 - 所有LLVM对象的容器
  let ?
context = () -> ?
@llvm.Context::create()

  // 2. 创建模块 - 函数和全局变量的容器
  let ?
module = ?
context.(String) -> ?
create_module("demo_module")

  // 3. 创建IR构建器 - 用于生成指令
  let ?
builder = ?
context.() -> ?
create_builder()

  (?
context, ?
module, ?
builder)
}

一个简单的函数生成示例

让我们实现一个计算 (a * b) + c 的函数：

pub fn () -> String
generate_muladd_function() -> String
String {
  // 初始化LLVM基础设施
  let (?
context, ?
module, ?
builder) = () -> (?, ?, ?)
initialize_llvm()

  // 定义函数签名
  let Unit
i32_ty = ?
context.() -> Unit
i32_type()
  let Unit
func_type = Unit
i32_ty.(Array[Unit]) -> Unit
fn_type([Unit
i32_ty, Unit
i32_ty, Unit
i32_ty])
  let Unit
func_value = ?
module.(String, Unit) -> Unit
add_function("muladd", Unit
func_type)

  // 创建函数入口基本块
  let Unit
entry_block = ?
context.(Unit, String) -> Unit
append_basic_block(Unit
func_value, "entry")
  ?
builder.(Unit) -> Unit
position_at_end(Unit
entry_block)

  // 获取函数参数
  let Unit
arg_a = Unit
func_value.(Int) -> Unit
get_nth_param(0).() -> Unit
unwrap().() -> Unit
into_int_value()
  let Unit
arg_b = Unit
func_value.(Int) -> Unit
get_nth_param(1).() -> Unit
unwrap().() -> Unit
into_int_value()
  let Unit
arg_c = Unit
func_value.(Int) -> Unit
get_nth_param(2).() -> Unit
unwrap().() -> Unit
into_int_value()

  // 生成计算指令
  let Unit
mul_result = ?
builder.(Unit, Unit) -> Unit
build_int_mul(Unit
arg_a, Unit
arg_b).() -> Unit
into_int_value()
  let Unit
add_result = ?
builder.(Unit, Unit) -> Unit
build_int_add(Unit
mul_result, Unit
arg_c)

  // 生成返回指令
  let _ = ?
builder.(Unit) -> Unit
build_return(Unit
add_result)

  // 输出生成的IR
  ?
module.() -> String
dump()
}

生成的LLVM IR

运行上述代码将产生以下LLVM中间表示：

; ModuleID = 'demo_module'
source_filename = "demo_module"

define i32 @muladd(i32 %0, i32 %1, i32 %2) {
entry:
  %3 = mul i32 %0, %1
  %4 = add i32 %3, %2
  ret i32 %4
}

💡 代码生成最佳实践

命名约定

有返回值的指令，构建接口有一个name的label argument，可以给指令的结果添加名称。

let ?
mul_result = Unit
builder.(Unit, Unit, String) -> ?
build_int_mul(Unit
lhs, Unit
rhs, String
name="temp_product")
let ?
final_result = Unit
builder.(?, Unit, String) -> ?
build_int_add(?
mul_result, Unit
offset, String
name="final_sum")

错误处理

使用raise而并非panic来进行错误处理，对不好直接确定的情况进行异常管理。

// 对可能失败的操作进行检查
match func_value.get_nth_param(index) {
  Some(param) => param.into_int_value()
  None => raise CodeGenError("Function parameter \{index} not found")
}

第四章：TinyMoonbit编译器实现

现在让我们将注意力转向真正的编译器实现，将上篇文章中构建的抽象语法树转换为LLVM IR。

类型映射：从Parser到LLVM

首先需要建立TinyMoonbit类型系统与LLVM类型系统之间的映射关系：

pub struct CodeGen {
  ?
parser_program : Program                    // 源程序的AST表示
  ?
llvm_context : @llvm.Context               // LLVM上下文
  ?
llvm_module : @llvm.Module                 // LLVM模块
  ?
builder : @llvm.Builder                    // IR构建器
  Map[String, ?]
llvm_functions : type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[String
String, @llvm.FunctionValue]  // 函数映射表
}

pub fn (?, ?) -> Unit raise
convert_type(?
self : Self, ?
parser_type : Type) -> &@llvm.Type raise {
  match ?
parser_type {
    Type::?
Unit => ?
selfUnit
.?
llvm_contextUnit
.() -> Unit
void_typeUnit
() as &@llvm.Type
    Type::?
Bool => ?
self.?
llvm_context.() -> Unit
bool_type()
    Type::?
Int => ?
self.?
llvm_context.() -> Unit
i32_type()
    Type::?
Double => ?
self.?
llvm_context.() -> Unit
f64_type()
    // 可以根据需要扩展更多类型
  }
}

环境管理：变量到值的映射

在代码生成阶段，我们需要维护一个从变量名到LLVM值的映射关系：

pub struct Env {
  Env?
parent : struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env?                        // 父环境引用
  Map[String, Unit]
symbols : type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[String
String, &@llvm.Value]        // 局部变量映射

  // 全局信息
  CodeGen
codegen : struct CodeGen {
  parser_program: ?
  llvm_context: ?
  llvm_module: ?
  builder: ?
  llvm_functions: Map[String, ?]
}
CodeGen                           // 代码生成器引用
  ?
parser_function : Function                  // 当前函数的AST
  ?
llvm_function : @llvm.FunctionValue         // 当前函数的LLVM表示
}

pub fn (?, String) -> Unit?
get_symbol(?
self : Self, String
name : String
String) -> &@llvm.Value? {
  match ?
self.Map[String, Unit]
symbols.(self : Map[String, Unit], key : String) -> Unit?
Retrieves the value associated with a given key in the hash map.
Parameters:

self : The hash map to search in.
key : The key to look up in the map.
Returns Some(value) if the key exists in the map, None otherwise.
Example:
  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get(String
name) {
    (Unit) -> Unit?
Some(Unit
value) => (Unit) -> Unit?
Some(Unit
value)
    Unit?
None =>
      match ?
self.Env?
parent {
        (Env) -> Env?
Some(Env
parent_env) => Env
parent_env.(String) -> Unit?
get_symbol(String
name)
        Env?
None => Unit?
None
      }
  }
}

变量处理：内存分配策略

TinyMoonbit作为一个系统级语言，支持变量的重新赋值。在LLVM IR的SSA（Static Single Assignment）形式中，我们需要采用alloca + load/store的模式来实现可变变量：

pub fn Stmt::(?, Env) -> Unit raise
emit(?
self : Self, Env
env : struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env) -> Unit
Unit raise {
  match ?
self {
    // 变量声明：例如let x : Int = 5;
    (String, Unit, Unit) -> ?
Let(String
var_name, Unit
var_type, Unit
init_expr) => {
      // 转换类型并分配栈空间
      let Unit
llvm_type = Env
env.CodeGen
codegen.(Unit) -> Unit
convert_type(Unit
var_type)
      let Unit
alloca = Env
env.CodeGen
codegen.?
builder.(Unit, String) -> Unit
build_alloca(Unit
llvm_type, String
var_name)

      // 将分配的指针记录到符号表
      Env
env.Map[String, Unit]
symbols.(self : Map[String, Unit], key : String, value : Unit) -> Unit
Sets a key-value pair into the hash map. If the key already exists, updates
its value. If the hash map is near full capacity, automatically
grows the internal storage to accommodate more entries.
Parameters:

map : The hash map to modify.
key : The key to insert or update. Must implement Hash and Eq traits.
value : The value to associate with the key.
Example:
  let map : Map[String, Int] = Map::new()
  map.set("key", 42)
  inspect(map.get("key"), content="Some(42)")
  map.set("key", 24) // update existing key
  inspect(map.get("key"), content="Some(24)")
set(String
var_name, Unit
allocaUnit
 as &@llvm.Value)

      // 计算初始化表达式的值
      let Unit
init_value = Unit
init_expr.(Env) -> Unit
emit(Env
env).() -> Unit
into_basic_value()

      // 将初始值存储到分配的内存
      let _ = Env
env.CodeGen
codegen.?
builder.(Unit, Unit) -> Unit
build_store(Unit
alloca, Unit
init_value)
    }

    // 变量赋值：x = 10;
    (Unit, Unit) -> ?
Assign(Unit
var_name, Unit
rhs_expr) => {
      // 从符号表获取变量的内存地址
      guard let (_/0) -> Unit
Some(_/0
var_ptr) = Env
env.(Unit) -> Unit
get_symbol(Unit
var_name) else {
        raise Error
CodeGenError("Undefined variable: \{Unit
var_name}")
      }

      // 计算右侧表达式的值
      let Unit
rhs_value = Unit
rhs_expr.(Env) -> Unit
emit(Env
env).() -> Unit
into_basic_value()

      // 存储新值到变量内存
      let _ = Env
env.CodeGen
codegen.?
builder.(Unit, Unit) -> Unit
build_store(Unit
var_ptr, Unit
rhs_value)
    }

    // 其他语句类型...
    _ => { /* 处理其他语句 */ }
  }
}

设计决策：为什么使用alloca？

在函数式语言中，不可变变量可以直接映射为SSA值。但TinyMoonbit支持变量重新赋值，这与SSA的"每个变量只赋值一次"原则冲突。

alloca + load/store 模式是处理可变变量的标准做法：

alloca：在栈上分配内存空间

store：将值写入内存

load：从内存读取值

LLVM的优化过程会自动将简单的alloca转换回值形式（mem2reg优化）。

表达式代码生成

表达式的代码生成相对直观，主要是根据表达式类型调用相应的指令构建方法：

fn Expr::(?, Env) -> Unit raise
emit(?
self: Self, Env
env: struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env) -> &@llvm.Value raise {
  match ?
self {
    (Unit) -> ?
AtomExpr(Unit
atom_expr, ..) => Unit
atom_expr.(Env) -> Unit
emit(Env
env)
    (String, Unit, _/0) -> ?
Unary("-", Unit
expr, _/0
ty = (_/0) -> _/0
Some(_/0
Int)) => {
      let Unit
value = Unit
expr.() -> Unit
emit().() -> Unit
into_int_value()
      let Unit
zero = Env
env.Unit
gen.Unit
llvm_ctx.() -> Unit
i32_type().() -> Unit
const_zeor()
      Env
env.Unit
gen.?
builder.(Unit, Unit) -> Unit
build_int_sub(Unit
zero, Unit
value)
    }
    (String, Unit, _/0) -> ?
Unary("-", Unit
expr, _/0
ty = (_/0) -> _/0
Some(_/0
Double)) => {
      let Unit
value = Unit
expr.() -> Unit
emit().() -> Unit
into_float_value()
      Env
env.Unit
gen.?
builder.(Unit) -> Unit
build_float_neg(Unit
value)
    }
    (String, Unit, Unit, _/0) -> ?
Binary("+", Unit
lhs, Unit
rhs, _/0
ty=(_/0) -> _/0
Some(_/0
Int)) => {
      let Unit
lhs_val = Unit
lhs.() -> Unit
emit().() -> Unit
into_int_value()
      let Unit
rhs_val = Unit
rhs.() -> Unit
emit().() -> Unit
into_int_value()
      Env
env.Unit
gen.?
builder.(Unit, Unit) -> Unit
build_int_add(Unit
lhs_val, Unit
rhs_val)
    }
    // ... others
  }
}

技术细节：浮点数取负

注意在处理浮点数取负时，我们使用 build_float_neg 而不是用零减去操作数。这是因为：

IEEE 754标准：浮点数有特殊值（如NaN、∞），简单的减法可能产生不正确的结果

性能考虑：专用的否定指令在现代处理器上通常更高效

精度保证：避免了不必要的舍入误差

第五章：控制流指令的实现

控制流是程序逻辑的骨架，包括条件分支和循环结构。在LLVM IR中，控制流通过基本块（Basic Blocks）和分支指令来实现。每个基本块代表一个没有内部跳转的指令序列，块与块之间通过分支指令连接。

条件分支：if-else语句的实现

条件分支需要创建多个基本块来表示不同的执行路径：

fn Stmt::(?, Env) -> Unit raise
emit(?
self: Self, Env
env: struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env) -> Unit
Unit raise {
  let Unit
ctx = Env
env.Unit
gen.Unit
llvm_ctx
  let Unit
func = Env
env.Unit
llvm_func
  let ?
builder = Env
env.Unit
gen.?
builder
  match ?
self {
    (Unit, Unit, Unit) -> ?
If(Unit
cond, Unit
then_stmts, Unit
else_stmts) => {
      let Unit
cond_val = Unit
cond.(Env) -> Unit
emit(Env
env).() -> Unit
into_int_value()

      // 创建三个基本块
      let Unit
then_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)
      let Unit
else_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)
      let Unit
merge_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)

      // 创建跳转指令
      let _ = ?
builder.(Unit, Unit, Unit) -> Unit
build_conditional_branch(
        Unit
cond_val, Unit
then_block, Unit
else_block,
      )

      // 生成then_block的代码
      ?
builder.(Unit) -> Unit
position_at_end(Unit
then_block)
      let Unit
then_env = ?
self.() -> Unit
subenv()
      Unit
then_stmts.((Unit) -> Unit) -> Unit
each(Unit
s => Unit
s.(Unit) -> Unit
emitStmt(Unit
then_env))
      let _ = ?
builder.(Unit) -> Unit
build_unconditional_branch(Unit
merge_block)

      // 生成else_block的代码
      ?
builder.(Unit) -> Unit
position_at_end(Unit
else_block)
      let Unit
else_env = ?
self.() -> Unit
subenv()
      Unit
else_stmts.((Unit) -> Unit) -> Unit
each(Unit
s => Unit
s.(Unit) -> Unit
emitStmt(Unit
else_env))
      let _ = ?
builder.(Unit) -> Unit
build_unconditional_branch(Unit
merge_block)

      // 代码生成完毕后，builder的位置要在merge_block上
      ?
builder.(Unit) -> Unit
position_at_end(Unit
merge_block)

    }
    // ...
  }
}

生成的LLVM IR示例

对于以下TinyMoonbit代码：

if x > 0 {
  y = x + 1;
} else {
  y = x - 1;
}

将生成类似这样的LLVM IR：

  %1 = load i32, ptr %x, align 4
  %2 = icmp sgt i32 %1, 0
  br i1 %2, label %if.then, label %if.else

if.then:                                          ; preds = %0
  %3 = load i32, ptr %x, align 4
  %4 = add i32 %3, 1
  store i32 %4, ptr %y, align 4
  br label %if.end

if.else:                                          ; preds = %0
  %5 = load i32, ptr %x, align 4
  %6 = sub i32 %5, 1
  store i32 %6, ptr %y, align 4
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  ; 后续代码...

循环结构：while语句的实现

循环的实现需要特别注意条件检查和循环体的正确连接：

fn Stmt::(?, Env) -> Unit raise
emit(?
self: Self, Env
env: struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env) -> Unit
Unit raise {
  let Unit
ctx = Env
env.Unit
gen.Unit
llvm_ctx
  let Unit
func = Env
env.Unit
llvm_func
  let ?
builder = Env
env.Unit
gen.?
builder
  match ?
self {
    (Unit, Unit) -> ?
While(Unit
cond, Unit
body) => {
      // 生成三个块
      let Unit
cond_block = Unit
ctx.(Unit) -> Unit
append_basic_block(.llvm_func)
      let Unit
body_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)
      let Unit
merge_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)

      // 首先无条件跳转到cond块
      let _ = ?
builder.(Unit) -> Unit
build_unconditional_branch(Unit
cond_block)
      ?
builder.(Unit) -> Unit
position_at_end(Unit
cond_block)

      // 在cond块内生成代码，以及条件跳转指令
      let Unit
cond_val = Unit
cond.() -> Unit
emit().() -> Unit
into_int_value()
      let _ = ?
builder.(Unit, Unit, Unit) -> Unit
build_conditional_branch(
        Unit
cond_val, Unit
body_block, Unit
merge_block,
      )
      ?
builder.(Unit) -> Unit
position_at_end(Unit
body_block)

      // 对body块生成代码，末尾需要一个无条件跳转指令，到cond块
      let Unit
body_env = ?
self.() -> Unit
subenv()
      Unit
body.((Unit) -> Unit) -> Unit
each(Unit
s => Unit
s.(Unit) -> Unit
emitStmt(Unit
body_env))
      let _ = ?
builder.(Unit) -> Unit
build_unconditional_branch(Unit
cond_block)

      // 代码生成结束以后，跳转到merge block
      ?
builder.(Unit) -> Unit
position_at_end(Unit
merge_block)
    }
    // ...
  }
}

生成的LLVM IR示例

对于TinyMoonbit代码：

while i < 10 {
  i = i + 1;
}

将生成：

  br label %while.cond

while.cond:                                       ; preds = %while.body, %0
  %1 = load i32, ptr %i, align 4
  %2 = icmp slt i32 %1, 10
  br i1 %2, label %while.body, label %while.end

while.body:                                       ; preds = %while.cond
  %3 = load i32, ptr %i, align 4
  %4 = add i32 %3, 1
  store i32 %4, ptr %i, align 4
  br label %while.cond

while.end:                                        ; preds = %while.cond
  ; 后续代码...

**💡 控制流设计要点 **

基本块的命名策略

append_basic_block 函数同样有name这个label argument。

// 使用描述性的块名称，便于调试和理解
let ?
then_block = Unit
context.(Unit, String) -> ?
append_basic_block(Unit
func, String
name="if.then")
let ?
else_block = Unit
context.(Unit, String) -> ?
append_basic_block(Unit
func, String
name="if.else")
let ?
merge_block = Unit
context.(Unit, String) -> ?
append_basic_block(Unit
func, String
name="if.end")

作用域管理

// 为每个分支和循环体创建独立的作用域
let ?
branch_env = Unit
env.() -> ?
sub_env()
branch_stmts.each( stmt => stmt.emit(branch_env) }

构建器位置管理

末尾注意将指令构建器放到正确的基本块上。

// 始终确保构建器指向正确的基本块
builder.position_at_end(merge_block)
// 在这个块中生成指令...

第六章：从LLVM IR到机器代码

在生成完整的LLVM IR之后，我们需要将其转换为目标机器的汇编代码。虽然llvm.mbt提供了完整的目标机器配置API，但对于学习目的，我们可以使用更简便的方法。

使用llc工具链进行编译

最直接的方法是将生成的LLVM IR输出到文件，然后使用LLVM工具链进行编译：

调用Module的dump函数即可，也可以使用println函数。

let CodeGen
gen : struct CodeGen {
  parser_program: ?
  llvm_context: ?
  llvm_module: ?
  builder: ?
  llvm_functions: Map[String, ?]
}
CodeGen = ...
let ?
prog = CodeGen
gen.?
llvm_prog
prog.dump() // 更建议使用dump，会比println快一点，效果相同

// or println(prog)

完整的编译流程示例

让我们看一个完整的从源代码到汇编代码的编译流程：

TinyMoonbit源代码

fn (n : Int) -> Int
factorial(Int
n: Int
Int) -> Int
Int {
  if Int
n (self_ : Int, other : Int) -> Bool
<= 1 {
    return 1;
  }
  return Int
n (self : Int, other : Int) -> Int
Multiplies two 32-bit integers. This is the implementation of the *
operator for Int.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns the product of the two integers. If the result overflows the range of
Int, it wraps around according to two's complement arithmetic.
Example:
  inspect(42 * 2, content="84")
  inspect(-10 * 3, content="-30")
  let max = 2147483647 // Int.max_value
  inspect(max * 2, content="-2") // Overflow wraps around
* (n : Int) -> Int
factorial(Int
n (self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:

self : The minuend (the number being subtracted from).
other : The subtrahend (the number to subtract).
Returns the difference between self and other.
Example:
  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
- 1);
}

fn main() -> Unit {
  let Int
result: Int
Int = (n : Int) -> Int
factorial(5);
  (Int) -> Unit
print_int(Int
result);
}

生成的LLVM IR

; ModuleID = 'tinymoonbit'
source_filename = "tinymoonbit"

define i32 @factorial(i32 %0) {
entry:
  %1 = alloca i32, align 4
  store i32 %0, ptr %1, align 4
  %2 = load i32, ptr %1, align 4
  %3 = icmp sle i32 %2, 1
  br i1 %3, label %4, label %6

4:                                                ; preds = %entry
  ret i32 1

6:                                                ; preds = %entry
  %7 = load i32, ptr %1, align 4
  %8 = load i32, ptr %1, align 4
  %9 = sub i32 %8, 1
  %10 = call i32 @factorial(i32 %9)
  %11 = mul i32 %7, %10
  ret i32 %11
}

define void @main() {
entry:
  %0 = alloca i32, align 4
  %1 = call i32 @factorial(i32 5)
  store i32 %1, ptr %0, align 4
  %2 = load i32, ptr %0, align 4
  call void @print_int(i32 %2)
  ret void
}

declare void @print_int(i32 %0)

使用LLC生成RISC-V汇编

# 生成llvm ir
moon run main --target native > fact.ll

# 生成RISC-V 64位汇编代码
llc -march=riscv64 -mattr=+m -o fact.s fact.ll

生成的RISC-V汇编片段

factorial:
.Lfunc_begin0:
	.cfi_startproc
	addi	sp, sp, -32
	.cfi_def_cfa_offset 32
	sd	ra, 24(sp)
	.cfi_offset ra, -8
	sd	s0, 16(sp)
	.cfi_offset s0, -16
	addi	s0, sp, 32
	.cfi_def_cfa s0, 0
	sw	a0, -20(s0)
	lw	a0, -20(s0)
	li	a1, 1
	blt	a1, a0, .LBB0_2
	li	a0, 1
	j	.LBB0_3
.LBB0_2:
	lw	a0, -20(s0)
	lw	a1, -20(s0)
	addi	a1, a1, -1
	sw	a0, -24(s0)
	mv	a0, a1
	call	factorial
	lw	a1, -24(s0)
	mul	a0, a1, a0
.LBB0_3:
	ld	ra, 24(sp)
	ld	s0, 16(sp)
	addi	sp, sp, 32
	ret

结语

通过本系列的两篇文章，我们完成了一个功能完整的编译器实现。尽管功能简单，但比较完整。从字符流的词法分析，到抽象语法树的构建，再到LLVM IR的生成和机器代码的输出。

回顾

上篇：

基于模式匹配的优雅词法分析器
递归下降语法分析器的实现
完整的类型检查系统
环境链作用域管理

下篇：

LLVM类型和值系统的深入理解
SSA形式下的变量管理策略
控制流指令的正确实现
完整的代码生成流水线

Moonbit在编译器开发中的优势

通过这个实践项目，我们深刻体会到了Moonbit在编译器构建领域的独特价值：

表达力强大的模式匹配：极大简化了AST处理和类型分析的复杂度。
函数式编程范式：不可变数据结构和纯函数使得编译器逻辑更加清晰可靠。
现代化的类型系统：trait对象、泛型和错误处理机制提供了充分的抽象能力。
优秀的工程特性：derive功能、JSON序列化等特性显著提升了开发效率。

结语

编译器技术代表了计算机科学理论与工程实践的完美结合。通过Moonbit这一现代化的工具，我们能够以更加优雅和高效的方式探索这个古老而又充满活力的领域。

希望本系列文章能够为读者在编译器设计的道路上提供一个有力的帮助。

学习资源推荐

Moonbit官方文档

llvm.mbt文档

llvm.mbt项目

LLVM官方教程

Moonbit 与 llvm 共舞上篇 - 实现语法前端

2025年8月4日 · 阅读需 17 分钟

引言

编程语言设计与编译器实现历来被视为计算机科学领域中最具挑战性的课题之一。传统的编译器教学路径往往要求学生首先掌握复杂的理论基础：

自动机理论：有限状态自动机与正则表达式
类型理论：λ演算与类型系统的数学基础
计算机体系结构：从汇编语言到机器码的底层实现

然而，Moonbit作为一门专为现代开发环境设计的函数式编程语言，为我们提供了一个全新的视角。它不仅具备严谨的类型系统和卓越的内存安全保障，更重要的是，其丰富的语法特性和为AI时代量身定制的工具链，使得Moonbit成为学习和实现编译器的理想选择。

系列概述 本系列文章将通过构建一个名为TinyMoonbit的小型编程语言编译器，深入探讨现代编译器实现的核心概念和最佳实践。

上篇：聚焦语言前端的实现，包括词法分析、语法解析和类型检查，最终生成带有完整类型标记的抽象语法树

下篇：深入代码生成阶段，利用Moonbit官方的llvm.mbt绑定库，将语法树转换为LLVM中间表示，并最终生成RISC-V汇编代码

TinyMoonbit 语言设计

TinyMoonbit是一种系统级编程语言，其抽象层次与C语言相当。虽然在语法设计上大量借鉴了Moonbit的特性，但TinyMoonbit实际并非Moonbit语言的子集，而是一个为测试llvm.mbt功能完备性兼具教学作用的简化版本。

注：由于篇幅限制，本系列文章所提到的TinyMoonbit实现比真正的TinyMoonbit更加简单，完整版本请参考 TinyMoonbitLLVM。

核心特性

TinyMoonbit提供了现代系统编程所需的基础功能：

✅ 底层内存操作：直接的指针操作和内存管理
✅ 控制流结构：条件分支、循环和函数调用
✅ 类型安全：静态类型检查和明确的类型声明
❌ 简化设计：为降低实现复杂度，不支持类型推导和闭包等高级特性

语法示例

让我们通过一个经典的斐波那契数列实现来展示TinyMoonbit的语法：

extern fn (x : Int) -> Unit
print_int(Int
x : Int
Int) -> Unit
Unit;

// 递归实现斐波那契数列
fn (n : Int) -> Int
fib(Int
n : Int
Int) -> Int
Int {
  if Int
n (self_ : Int, other : Int) -> Bool
<= 1 {
    return Int
n;
  }
  return (n : Int) -> Int
fib(Int
n (self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:

self : The minuend (the number being subtracted from).
other : The subtrahend (the number to subtract).
Returns the difference between self and other.
Example:
  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
- 1) (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ (n : Int) -> Int
fib(Int
n (self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:

self : The minuend (the number being subtracted from).
other : The subtrahend (the number to subtract).
Returns the difference between self and other.
Example:
  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
- 2);
}

fn main() -> Unit {
  (x : Int) -> Unit
print_int((n : Int) -> Int
fib(10));
}

编译目标

经过完整的编译流程后，上述代码将生成如下的LLVM中间表示：

; ModuleID = 'tinymoonbit'
source_filename = "tinymoonbit"

define i32 @fib(i32 %0) {
entry:
  %1 = alloca i32, align 4
  store i32 %0, ptr %1, align 4
  %2 = load i32, ptr %1, align 4
  %3 = icmp sle i32 %2, 1
  br i1 %3, label %4, label %6

4:                                                ; preds = %entry
  %5 = load i32, ptr %1, align 4
  ret i32 %5

6:                                                ; preds = %4, %entry
  %7 = load i32, ptr %1, align 4
  %8 = sub i32 %7, 1
  %9 = call i32 @fib(i32 %8)
  %10 = load i32, ptr %1, align 4
  %11 = sub i32 %10, 2
  %12 = call i32 @fib(i32 %11)
  %13 = add i32 %9, %12
  ret i32 %13
}

define void @main() {
entry:
  %0 = call i32 @fib(i32 10)
  call void @print_int(i32 %0)
}

declare void @print_int(i32 %0)

第二章：词法分析

词法分析（Lexical Analysis）构成了编译过程的第一道关卡，其核心使命是将连续的字符流转换为具有语义意义的词法单元（Tokens）序列。这个看似简单的转换过程，实际上是整个编译器流水线的基石。

从字符到符号：Token的设计与实现

考虑以下代码片段：

let Int
x : Int
Int = 5;

经过词法分析器处理后，将产生如下的Token序列：

(Keyword "let") → (Identifier "x") → (Symbol ":") →
(Type "Int") → (Operator "=") → (IntLiteral 5) → (Symbol ";")

这个转换过程需要处理多种复杂情况：

空白符过滤：跳过空格、制表符和换行符
关键字识别：区分保留字与用户定义标识符
数值解析：正确识别整数、浮点数的边界
运算符处理：区分单字符和多字符运算符

Token类型系统设计

基于TinyMoonbit的语法规范，我们将所有可能的符号分类为以下Token类型：

pub enum Token {
  (Bool) -> Token
Bool(Bool
Bool)       // 布尔值：true, false
  (Int) -> Token
Int(Int
Int)         // 整数：1, 2, 3, ...
  (Double) -> Token
Double(Double
Double)   // 浮点数：1.0, 2.5, 3.14, ...
  (String) -> Token
Keyword(String
String)  // 保留字：let, if, while, fn, return
  (String) -> Token
Upper(String
String)    // 类型标识符：首字母大写，如 Int, Double, Bool
  (String) -> Token
Lower(String
String)    // 变量标识符：首字母小写，如 x, y, result
  (String) -> Token
Symbol(String
String)   // 运算符和标点：+, -, *, :, ;, ->
  (Char) -> Token
Bracket(Char
Char)    // 括号类：(, ), [, ], {, }
  Token
EOF              // 文件结束标记
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq)

利用模式匹配

Moonbit的强大模式匹配能力使我们能够以一种前所未有的优雅方式实现词法分析器。与传统的有限状态自动机方法相比，这种基于模式匹配的实现更加直观和易于理解。

核心分析函数

pub fn (code : String) -> Array[Token]
lex(String
code: String
String) -> type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token] {
  let Array[Token]
tokens = type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array::(capacity? : Int) -> Array[Token]
Creates a new empty array with an optional initial capacity.
Parameters:

capacity : The initial capacity of the array. If 0 (default), creates an
array with minimum capacity. Must be non-negative.
Returns a new empty array of type Array[T] with the specified initial
capacity.
Example:
  let arr : Array[Int] = Array::new(capacity=10)
  inspect(arr.length(), content="0")
  inspect(arr.capacity(), content="10")

  let arr : Array[Int] = Array::new()
  inspect(arr.length(), content="0")
new()

  loop String
code[:] {
    // 跳过空白字符
    StringView
[' ' | '\n' | '\r' | '\t', ..rest] =>
      continue StringView
rest

    // 处理单行注释
    StringView
[.."//", ..rest] =>
      continue loop StringView
rest {
        StringView
['\n' | '\r', ..rest_str] => break StringView
rest_str
        StringView
[_, ..rest_str] => continue StringView
rest_str
        StringView
[] as rest_str => break StringView
rest_str
      }

    // 识别多字符运算符（顺序很重要！）
    StringView
[.."->", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("->")); continue StringView
rest }
    StringView
[.."==", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("==")); continue StringView
rest }
    StringView
[.."!=", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("!=")); continue StringView
rest }
    StringView
[.."<=", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("<=")); continue StringView
rest }
    StringView
[..">=", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol(">=")); continue StringView
rest }

    // 识别单字符运算符和标点符号
    [':' | '.' | ',' | ';' | '+' | '-' | '*' |
     '/' | '%' | '>' | '<' | '=' as c, ..rest] => {
      Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("\{Char
c}"))
      continue StringView
rest
    }

    // 识别括号
    StringView
[Char
'(' | ')' | '[' | ']' | '{' | '}' as cStringView
, ..rest] => {
      Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((Char) -> Token
Bracket(Char
c))
      continue StringView
rest
    }

    // 识别标识符和字面量
    StringView
['a'..='z', ..] as code => {
      let (Token
tok, StringView
rest) = (StringView) -> (Token, StringView)
lex_ident(StringView
code);
      Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Token
tok)
      continue StringView
rest
    }

    ['A'..='Z', ..] => { ... }
    ['0'..='9', ..] => { ... }

    // 到达文件末尾
    [] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Token
EOF); break Array[Token]
tokens }
  }
}

关键字识别策略

标识符解析需要特别处理关键字的识别：

pub fn (rest : StringView) -> (Token, StringView)
let_ident(StringView
rest: type StringView
StringView represents a view of a String that maintains proper Unicode
character boundaries. It allows safe access to a substring while handling
multi-byte characters correctly.
@string.View) -> (enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token, type StringView
StringView represents a view of a String that maintains proper Unicode
character boundaries. It allows safe access to a substring while handling
multi-byte characters correctly.
@string.View) {
  // 预定义关键字映射表
  let Unit
keyword_map = Unit
Map.(Array[(String, Token)]) -> Unit
from_array([
    ("let", Token::(String) -> Token
Keyword("let")),
    ("fn", Token::(String) -> Token
Keyword("fn")),
    ("if", Token::(String) -> Token
Keyword("if")),
    ("else", Token::(String) -> Token
Keyword("else")),
    ("while", Token::(String) -> Token
Keyword("while")),
    ("return", Token::(String) -> Token
Keyword("return")),
    ("extern", Token::(String) -> Token
Keyword("extern")),
    ("true", Token::(Bool) -> Token
Bool(true)),
    ("false", Token::(Bool) -> Token
Bool(false)),
  ])

  let Array[Char]
identifier_chars = type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array::(capacity? : Int) -> Array[Char]
Creates a new empty array with an optional initial capacity.
Parameters:

capacity : The initial capacity of the array. If 0 (default), creates an
array with minimum capacity. Must be non-negative.
Returns a new empty array of type Array[T] with the specified initial
capacity.
Example:
  let arr : Array[Int] = Array::new(capacity=10)
  inspect(arr.length(), content="0")
  inspect(arr.capacity(), content="10")

  let arr : Array[Int] = Array::new()
  inspect(arr.length(), content="0")
new()
  let StringView
remaining = loop StringView
rest {
    StringView
[Char
'a'..='z' | 'A'..='Z' | '0'..='9' | '_' as cStringView
, ..rest_str] => {
      Array[Char]
identifier_chars.(self : Array[Char], value : Char) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Char
c)
      continue StringView
rest_str
    }
    StringView
_ as rest_str => break StringView
rest_str
  }

  let String
ident = (Array[Char]) -> String
String::(chars : Array[Char]) -> String
Convert char array to string.
  let s = @string.from_array(['H', 'e', 'l', 'l', 'o'])
  assert_eq(s, "Hello")
Do not convert large datas to Array[Char] and build a string with String::from_array.
For efficiency considerations, it's recommended to use Buffer instead.
from_array(Array[Char]
identifier_chars)
  let Token
token = Unit
keyword_map.(Unit) -> Unit
get(Unit
identifier).(() -> Token) -> Token
or_else(() => Token::(String) -> Token
Lower(String
ident))

  (Token
token, StringView
remaining)
}

💡 Moonbit语法特性深度解析

上述词法分析器的实现充分展示了Moonbit在编译器开发中的几个突出优势：

函数式循环构造

loop initial_value {
  pattern1 => continue new_value1
  pattern2 => continue new_value2
  pattern3 => break final_value
}

loop并非传统意义上的循环结构，而是一种函数式循环：

接受一个初始参数作为循环状态
通过模式匹配处理不同情况
continue传递新状态到下一次迭代
break终止循环并返回最终值

字符串视图与模式匹配

Moonbit的字符串模式匹配功能极大简化了文本处理：

// 匹配单个字符
['a', ..rest] => // 以字符'a'开头

// 匹配字符范围
['a'..='z' as c, ..rest] => // 小写字母，绑定到变量c

// 匹配字符串字面量
[.."hello", ..rest] => // 等价于 ['h','e','l','l','o', ..rest]

// 匹配多个可能的字符
[' ' | '\t' | '\n', ..rest] => // 任意空白字符

模式匹配优先级的重要性

⚠️ 重要提醒：匹配顺序至关重要

在编写模式匹配规则时，必须将更具体的模式放在更一般的模式之前。例如：

// ✅ 正确的顺序
loop code[:] {
  [.."->", ..rest] => { ... }     // 先匹配多字符运算符
  ['-' | '>' as c, ..rest] => { ... }  // 再匹配单字符
}

// ❌ 错误的顺序 - "->"将永远无法被匹配
loop code[:] {
  ['-' | '>' as c, ..rest] => { ... }
  [.."->", ..rest] => { ... }     // 永远不会执行
}

通过这种基于模式匹配的方法，我们不仅避免了复杂的状态机实现，还获得了更清晰、更容易维护的代码结构。

第三章：语法分析与抽象语法树构建

语法分析（Syntactic Analysis）是编译器的第二个核心阶段，其任务是将词法分析产生的Token序列重新组织为具有层次结构的抽象语法树（Abstract Syntax Tree, AST）。这个过程不仅要验证程序是否符合语言的语法规则，更要为后续的语义分析和代码生成提供结构化的数据表示。

抽象语法树设计：程序的结构化表示

在构建语法分析器之前，我们需要精心设计AST的结构。这个设计决定了如何表示程序的语法结构，以及后续编译阶段如何处理这些结构。

1. 核心类型系统

首先，我们定义TinyMoonbit类型系统在AST中的表示：

pub enum Type {
  Type
Unit    // 单位类型，表示无返回值
  Type
Bool    // 布尔类型：true, false
  Type
Int     // 32位有符号整数
  Type
Double  // 64位双精度浮点数
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

pub fn (type_name : String) -> Type
parse_type(String
type_name: String
String) -> enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type {
  match String
type_name {
    "Unit" => Type::Type
Unit
    "Bool" => Type::Type
Bool
    "Int" => Type::Type
Int
    "Double" => Type::Type
Double
    _ => (msg : String) -> Type
Aborts the program with an error message. Always causes a panic, regardless
of the message provided.
Parameters:

message : A string containing the error message to be displayed when
aborting.
Returns a value of type T. However, this function never actually returns a
value as it always causes a panic.
abort("Unknown type: \{String
type_name}")
  }
}

2. 分层的AST节点设计

我们采用分层设计来清晰地表示程序的不同抽象层次：

原子表达式（AtomExpr）代表不可再分解的基本表达式单元：

pub enum AtomExpr {
  (Bool) -> AtomExpr
Bool(Bool
Bool)                                    // 布尔字面量
  (Int) -> AtomExpr
Int(Int
Int)                                      // 整数字面量
  (Double) -> AtomExpr
Double(Double
Double)                                // 浮点数字面量
  (String, ty~ : Type?) -> AtomExpr
Var(String
String, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)                  // 变量引用
  (Expr, ty~ : Type?) -> AtomExpr
Paren(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)                  // 括号表达式
  (String, Array[Expr], ty~ : Type?) -> AtomExpr
Call(String
String, type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr], mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)    // 函数调用
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

复合表达式（Expr）可以包含运算符和多个子表达式的更复杂结构：

pub enum Expr {
  (AtomExpr, ty~ : Type?) -> Expr
AtomExpr(enum AtomExpr {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Var(String, ty~ : Type?)
  Paren(Expr, ty~ : Type?)
  Call(String, Array[Expr], ty~ : Type?)
} derive(Show, Eq, ToJson)
AtomExpr, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)          // 原子表达式包装
  (String, Expr, ty~ : Type?) -> Expr
Unary(String
String, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)         // 一元运算：-, !
  (String, Expr, Expr, ty~ : Type?) -> Expr
Binary(String
String, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)  // 二元运算：+, -, *, /, ==, !=, 等
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

语句（Stmt）代表程序中的可执行单元：

pub enum Stmt {
  (String, Type, Expr) -> Stmt
Let(String
String, enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr)                      // 变量声明：let x : Int = 5;
  (String, Expr) -> Stmt
Assign(String
String, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr)                         // 赋值语句：x = 10;
  (Expr, Array[Stmt], Array[Stmt]) -> Stmt
If(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt], type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt])           // 条件分支：if-else
  (Expr, Array[Stmt]) -> Stmt
While(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt])                     // 循环语句：while
  (Expr?) -> Stmt
Return(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr?)                                // 返回语句：return expr;
  (Expr) -> Stmt
Expr(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr)                                   // 单表达式语句
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

顶层结构函数定义和完整程序：

pub struct Function {
  String
name : String
String                     // 函数名
  Array[(String, Type)]
params : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[(String
String, enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type)]    // 参数列表：[(参数名, 类型)]
  Type
ret_ty : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type                     // 返回类型
  Array[Stmt]
body : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt]                // 函数体语句序列
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

// 程序定义为函数名到函数定义的映射
pub type Program type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[String
String, struct Function {
  name: String
  params: Array[(String, Type)]
  ret_ty: Type
  body: Array[Stmt]
} derive(Show, Eq, ToJson)
Function]

设计要点：类型标记的可变性

注意到每个表达式节点都包含一个 mut ty~ : Type? 字段。这个设计允许我们在类型检查阶段填充类型信息，而不需要重新构建整个AST。

递归下降解析：自顶向下的构建策略

递归下降（Recursive Descent）是一种自顶向下的语法分析方法，其核心思想是为每个语法规则编写一个对应的解析函数。在Moonbit中，模式匹配使这种方法的实现变得异常优雅。

解析原子表达式

pub fn (tokens : ArrayView[Token]) -> (AtomExpr, ArrayView[Token]) raise
parse_atom_expr(
  ArrayView[Token]
tokens: #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]
) -> (enum AtomExpr {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Var(String, ty~ : Type?)
  Paren(Expr, ty~ : Type?)
  Call(String, Array[Expr], ty~ : Type?)
} derive(Show, Eq, ToJson)
AtomExpr, #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) raise {
  match ArrayView[Token]
tokens {
    // 解析字面量
    ArrayView[Token]
[(Bool) -> Token
BoolArrayView[Token]
(Bool
bArrayView[Token]
), ..rest] => (AtomExpr::(Bool) -> AtomExpr
Bool(Bool
b), ArrayView[Token]
rest)
    ArrayView[Token]
[(Int) -> Token
IntArrayView[Token]
(Int
iArrayView[Token]
), ..rest] => (AtomExpr::(Int) -> AtomExpr
Int(Int
i), ArrayView[Token]
rest)
    ArrayView[Token]
[(Double) -> Token
DoubleArrayView[Token]
(Double
dArrayView[Token]
), ..rest] => (AtomExpr::(Double) -> AtomExpr
Double(Double
d), ArrayView[Token]
rest)

    // 解析函数调用：func_name(arg1, arg2, ...)
    ArrayView[Token]
[(String) -> Token
LowerArrayView[Token]
(String
func_nameArrayView[Token]
), (Char) -> Token
BracketArrayView[Token]
('('), ..rest] => {
      let (Array[Expr]
args, Unit
rest) = (ArrayView[Token]) -> (Array[Expr], Unit)
parse_argument_list(ArrayView[Token]
rest)
      match Unit
rest {
        Unit
[(Char) -> _/0
BracketUnit
(')'), ..remaining] =>
          (AtomExpr::(String, Array[Expr], ty~ : Type?) -> AtomExpr
Call(String
func_name, Array[Expr]
args, Type?
ty=Type?
None), ArrayView[Token]
remaining)
        _ => raise Error
SyntaxError("Expected ')' after function arguments")
      }
    }

    // 解析变量引用
    ArrayView[Token]
[(String) -> Token
LowerArrayView[Token]
(String
var_nameArrayView[Token]
), ..rest] =>
      (AtomExpr::(String, ty~ : Type?) -> AtomExpr
Var(String
var_name, Type?
ty=Type?
None), ArrayView[Token]
rest)

    // 解析括号表达式：(expression)
    ArrayView[Token]
[(Char) -> Token
BracketArrayView[Token]
('('), ..rest] => {
      let (Expr
expr, ArrayView[Token]
rest) = (tokens : ArrayView[Token]) -> (Expr, ArrayView[Token]) raise
parse_expression(ArrayView[Token]
rest)
      match ArrayView[Token]
rest {
        ArrayView[Token]
[(Char) -> Token
BracketArrayView[Token]
(')'), ..remaining] =>
          (AtomExpr::(Expr, ty~ : Type?) -> AtomExpr
Paren(Expr
expr, Type?
ty=Type?
None), ArrayView[Token]
remaining)
        _ => raise Error
SyntaxError("Expected ')' after expression")
      }
    }

    _ => raise Error
SyntaxError("Expected atomic expression")
  }
}

解析语句

语句解析需要根据开头的关键字分发到不同的处理函数：

pub fn (tokens : ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_stmt(ArrayView[Token]
tokens : #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) -> (enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt, #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) {
  match ArrayView[Token]
tokens {
    // 解析let语句
    [(String) -> Token
Keyword("let"), (String) -> Token
Lower(String
var_name), (String) -> Token
Symbol(":"), ..] => { /* ... */ }

    // 解析if/while/return语句
    ArrayView[Token]
[(String) -> Token
KeywordArrayView[Token]
("if"), .. rest] => (ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_if_stmt(ArrayView[Token]
rest)
    ArrayView[Token]
[(String) -> Token
KeywordArrayView[Token]
("while"), .. rest] => (ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_while_stmt(ArrayView[Token]
rest)
    ArrayView[Token]
[(String) -> Token
KeywordArrayView[Token]
("return"), .. rest] => { /* ... */ }

    // 解析赋值语句
    ArrayView[Token]
[(String) -> Token
LowerArrayView[Token]
(_), (String) -> Token
SymbolArrayView[Token]
("="), .. rest] => (ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_assign_stmt(ArrayView[Token]
tokens)

    // 解析单表达式语句
    ArrayView[Token]
[(String) -> Token
LowerArrayView[Token]
(_), (String) -> Token
SymbolArrayView[Token]
("="), .. rest] => (ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_single_expr_stmt(ArrayView[Token]
tokens)

    _ => { /* 错误处理 */ }
  }
}

难点：处理运算符优先级：

表达式解析中最复杂的部分是处理运算符优先级，我们需要确保1 + 2 _ 3被正确解析为1 + (2 _ 3)而不是(1 + 2) * 3。

💡 Moonbit高级特性应用

自动派生功能

pub enum Expr {
  // ...
} derive(Show, Eq, ToJson)

Moonbit的 derive 功能自动为类型生成常用的实现，这里我们使用三个：

Show：提供调试输出功能
Eq：支持相等性比较
ToJson：序列化为JSON格式，便于调试和持久化

这些自动生成的功能在编译器开发中极为有用，特别是在调试和测试阶段。

错误处理机制

pub fn (tokens : ArrayView[Token]) -> (Expr, ArrayView[Token]) raise
parse_expression(ArrayView[Token]
tokens: #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) -> (enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) raise {
  // raise关键字表示此函数可能抛出异常
}

Moonbit的 raise 机制提供了结构化的错误处理，使得语法错误能够被准确定位和报告。

通过这种分层设计和递归下降的解析策略，我们构建了一个既灵活又高效的语法分析器，为后续的类型检查阶段奠定了坚实的基础。

第四章：类型检查与语义分析

语义分析是编译器设计中承上启下的关键阶段。虽然语法分析确保了程序结构的正确性，但这并不意味着程序在语义上是有效的。类型检查作为语义分析的核心组成部分，负责验证程序中所有操作的类型一致性，确保类型安全和运行时的正确性。

作用域管理：构建环境链

类型检查面临的首要挑战是正确处理变量的作用域（Scope）。在程序的不同层次（全局、函数、块级别），同一个变量名可能指向不同的实体。我们采用环境链（Environment Chain）的经典设计来解决这个问题：

pub struct TypeEnv[K, V] {
  TypeEnv[K, V]?
parent : struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv[type parameter K
K, type parameter V
V]?     // 指向父环境的引用
  Map[K, V]
data : type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[type parameter K
K, type parameter V
V]            // 当前环境的变量绑定
}

环境链的核心是变量查找算法，它遵循词法作用域的规则：

pub fn struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv::(self : TypeEnv[K, V], key : K) -> V?
get[K : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq + trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}
Trait for types that can be hashed
The hash method should return a hash value for the type, which is used in hash tables and other data structures.
The hash_combine method is used to combine the hash of the current value with another hash value,
typically used to hash composite types.
When two values are equal according to the Eq trait, they should produce the same hash value.
The hash method does not need to be implemented if hash_combine is implemented,
When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.
Hash, V](TypeEnv[K, V]
self : struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
Self[type parameter K
K, type parameter V
V], K
key : type parameter K
K) -> type parameter V
V? {
  match TypeEnv[K, V]
self.Map[K, V]
data.(self : Map[K, V], key : K) -> V?
Retrieves the value associated with a given key in the hash map.
Parameters:

self : The hash map to search in.
key : The key to look up in the map.
Returns Some(value) if the key exists in the map, None otherwise.
Example:
  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get(K
key) {
    (V) -> V?
Some(V
value) => (V) -> V?
Some(V
value)    // 在当前环境中找到
    V?
None =>
      match TypeEnv[K, V]
self.TypeEnv[K, V]?
parent {
        (TypeEnv[K, V]) -> TypeEnv[K, V]?
Some(TypeEnv[K, V]
parent_env) => TypeEnv[K, V]
parent_env.(self : TypeEnv[K, V], key : K) -> V?
get(K
key)  // 递归查找父环境
        TypeEnv[K, V]?
None => V?
None              // 到达顶层环境，变量未定义
      }
  }
}

设计原则：词法作用域

这种设计确保了变量的查找遵循词法作用域规则：

首先在当前作用域中查找

如果未找到，向上层作用域递归查找

直到找到变量或到达全局作用域

类型检查器架构

单纯的环境管理还不足以完成类型检查任务。某些操作（如函数调用）需要访问全局的程序信息。因此，我们设计了一个综合的类型检查器：

pub struct TypeChecker {
  TypeEnv[String, Type]
local_env : struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv[String
String, enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type]    // 本地变量环境
  Function
current_func : struct Function {
  name: String
  params: Array[(String, Type)]
  ret_ty: Type
  body: Array[Stmt]
} derive(Show, Eq, ToJson)
Function              // 当前检查的函数
  Program
program : type Program Map[String, Function]
Program                    // 完整的程序信息
}

部分节点类型检查的实现

类型检查器的核心是对不同AST节点应用相应的类型规则。以下是表达式类型检查的实现：

pub fn enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr::(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type(
  Expr
self : enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Self,
  TypeEnv[String, Type]
env : struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv[String
String, enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type]
) -> enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type raise {
  match Expr
self {
    // 原子表达式的类型检查
    (AtomExpr, ty~ : Type?) -> Expr
AtomExprExpr
(AtomExpr
atom_exprExpr
, ..) as node => {
      let Type
ty = AtomExpr
atom_expr.(TypeEnv[String, Type]) -> Type
check_type(TypeEnv[String, Type]
env)
      Expr
nodeUnit
.ty = (Type) -> Type?
SomeUnit
(Type
tyUnit
)  // 填充类型信息
      Type
ty
    }

    // 一元运算的类型检查
    (String, Expr, ty~ : Type?) -> Expr
UnaryExpr
("-", Expr
exprExpr
, ..) as node => {
      let Type
ty = Expr
expr.(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type(TypeEnv[String, Type]
env)
      Expr
nodeUnit
.ty = (Type) -> Type?
SomeUnit
(Type
tyUnit
)
      Type
ty
    }

    // 二元运算的类型检查
    (String, Expr, Expr, ty~ : Type?) -> Expr
BinaryExpr
(""+, Expr
lhsExpr
, Expr
rhsExpr
, ..) as node => {
      let Type
lhs_type = Expr
lhs.(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type(TypeEnv[String, Type]
env)
      let Type
rhs_type = Expr
rhs.(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type(TypeEnv[String, Type]
env)

      // 确保操作数类型一致
      guard Type
lhs_type (Type, Type) -> Bool
automatically derived
== Type
rhs_type else {
        raise Error
TypeCheckError(
          "Binary operation requires matching types, got \{Type
lhs_type} and \{Type
rhs_type}"
        )
      }

      let Type
result_type = match String
op {
        // 比较运算符总是返回布尔值
        "==" | "!=" | "<" | "<=" | ">" | ">=" => Type::Type
Bool

        // 算术运算符等保持操作数类型
        _ => Type
lhs_type
      }

      Expr
nodeUnit
.ty = (Type) -> Type?
SomeUnit
(Type
result_typeUnit
)
      Type
result_type
    }
  }
}

** 💡 Moonbit枚举修改技巧 **

在类型检查过程中，我们需要为AST节点填充类型信息。Moonbit提供了一种优雅的方式来修改枚举变体的可变字段：

pub enum Expr {
  AtomExpr(AtomExpr, mut ty~ : Type?)
  Unary(String, Expr, mut ty~ : Type?)
  Binary(String, Expr, Expr, mut ty~ : Type?)
} derive(Show, Eq, ToJson)

通过在模式匹配中使用 as 绑定，我们可以获得对枚举变体的引用并修改其可变字段：

match expr {
  AtomExpr(atom_expr, ..) as node => {
    let ?
ty = Unit
atom_expr.(Unit) -> ?
check_type(Unit
env)
    node.ty = Some(ty)  // 修改可变字段
    ty
  }
  // ...
}

这种设计避免了重新构建整个AST的开销，同时保持了函数式编程的风格。

完整编译流程展示

经过词法分析、语法分析和类型检查三个阶段，我们的编译器前端已经能够将源代码转换为完全类型化的抽象语法树。让我们通过一个简单的例子来展示完整的过程：

源代码示例

fn (x : Int, y : Int) -> Int
add(Int
x: Int
Int, Int
y: Int
Int) -> Int
Int {
  return Int
x (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ Int
y;
}

编译输出：类型化AST

利用 derive(ToJson) 功能，我们可以将最终的AST输出为JSON格式进行查看：

{
  "functions": {
    "add": {
      "name": "add",
      "params": [
        ["x", { "$tag": "Int" }],
        ["y", { "$tag": "Int" }]
      ],
      "ret_ty": { "$tag": "Int" },
      "body": [
        {
          "$tag": "Return",
          "0": {
            "$tag": "Binary",
            "0": "+",
            "1": {
              "$tag": "AtomExpr",
              "0": {
                "$tag": "Var",
                "0": "x",
                "ty": { "$tag": "Int" }
              },
              "ty": { "$tag": "Int" }
            },
            "2": {
              "$tag": "AtomExpr",
              "0": {
                "$tag": "Var",
                "0": "y",
                "ty": { "$tag": "Int" }
              },
              "ty": { "$tag": "Int" }
            },
            "ty": { "$tag": "Int" }
          }
        }
      ]
    }
  }
}

从这个JSON输出中，我们可以清楚地看到：

完整的函数签名：包括参数列表和返回类型
类型标记的AST节点：每个表达式都携带了类型信息
结构化的程序表示：为后续的代码生成阶段提供了清晰的数据结构

结语

通过本篇文章，我们深入探讨了编译器前端的完整实现流程。从字符流到类型化的抽象语法树，我们见证了Moonbit语言在编译器构建中的独特优势：

核心收获

模式匹配的威力：Moonbit的字符串模式匹配和结构化模式匹配极大简化了词法分析和语法分析的实现
函数式编程范式：loop构造、环境链和不可变数据结构的结合，提供了既优雅又高效的解决方案
类型系统的表达力：通过枚举的可变字段和trait对象，我们能够构建既类型安全又灵活的数据结构
工程化特性：derive功能、结构化错误处理和JSON序列化等特性，大大提升了开发效率

展望下篇

在掌握了语法前端的实现之后，下篇文章将引导我们进入更加激动人心的代码生成阶段。我们将：

深入了解LLVM中间表示的设计哲学
探索Moonbit官方llvm.mbt绑定库的使用方法
实现从AST到LLVM IR的完整转换
生成可执行的RISC-V汇编代码

编译器的构建是一个复杂而富有挑战性的过程，但正如我们在本篇中所展示的，Moonbit为这个过程提供了强大而优雅的工具。让我们在下篇中继续这段令人兴奋的编译器构建之旅。

资源推荐

Moonbit官方文档

llvm.mbt文档

llvm.mbt项目

LLVM官方教程

函数式里的依赖注入：Reader Monad

2025年7月23日 · 阅读需 9 分钟

经常搞六边形架构的人也知道，为了保持核心业务逻辑的纯粹和独立，我们会把像数据库、外部 API 调用这些“副作用”放在“端口”和“适配器”里，然后通过 DI 的方式注入到应用层。可以说，经典的面向对象和分层架构，离不开 DI。

然后，当我想在 MoonBit 里做点事情的时候，我发现我不能呼吸了。

我们也想讲究一个入乡随俗，但是在 moonbit 这种函数味儿很浓郁的场地，没有类，没有接口，更没有我们熟悉的那一套 DI 容器。那我怎么做 DI？

我当时就在想，软件工程发展到至今已经约 57 年，真的没有在函数式编程里解决 DI 的方法吗？

有的兄弟，有的。只是它在函数式编程里也属于一种 monad：Reader Monad

什么是 Monad

普通的函数就像一个流水线，你丢进去一袋面粉，然后直接跑到生产线末端，等着方便面出来。但这条流水线需要自动处理中间的所有复杂情况：

没放面粉/“没有下单，期待发货”（null）
面团含水量不够把压面机干卡了（抛出异常）
配料机需要读取今天的生产配方，比如是红烧牛肉味还是香菇炖鸡味（读取外部配置）
流水线末端的打包机需要记录今天打包了多少包（更新计数器）

Monad 就是专门管理这条复杂流水线的“总控制系统”。它把你的数据和处理流程的上下文一起打包，确保整个流程能顺畅、安全地进行下去。

在软件开发中，Monad 这一家子有几个常见的成员：

Option：处理“可能没有”的情况。盒子里要么有东西，要么是空的
Result：处理“可能会失败”的情况。盒子要么是绿的（成功），里面装着结果；要么是红的（失败），里面装着错误信息
State Monad：处理“需要修改状态”的情况。这个盒子在产出结果的同时，还会更新盒子侧面的一个计数器。或者说就是 React 里的 useState
Future(Promise)：处理“未来才有”的情况。这个盒子给你一张“提货单”，承诺未来会把货给你
Reader Monad: 盒子可以随时查阅“环境”，但不能修改它

Reader Monad

Reader Monad 的思想，最早可以追溯到上世纪90年代，在 Haskell 这种纯函数式编程语言的圈子里流行起来。当时大家为了坚守“函数纯度”这个铁律（即函数不能有副作用），就必须找到一种优雅的方式来让多个函数共享同一个配置环境，Reader Monad 就是为了解决这个矛盾而诞生的。

如今，它的应用场景已经非常广泛：

应用配置管理：用来传递数据库连接池、API密钥、功能开关等全局配置
请求上下文注入：在 Web 服务中，把当前登录的用户信息等打包成一个环境，供请求处理链上的所有函数使用
实现六边形架构：在六边形（或端口与适配器）架构中，它被用来在核心业务逻辑（Domain/Application Layer）和外部基础设施（Infrastructure Layer）之间建立一道防火墙

简单来说，Reader Monad 就是一个专门处理只读环境依赖的工具。它要解决的就是这些问题：

参数钻孔 (Parameter Drilling)：我们不想把一个 Properties 层层传递
逻辑与配置解耦：业务代码只关心“做什么”，而不用关心“配置从哪来”。这使得代码非常干净，且极易测试

核心方法

一个 Reader 库通常包含以下几个核心部分。

Reader::pure

就像是把一颗糖直接放进一个标准的午餐盒里。它把一个普通的值，包装成一个最简单的、不依赖任何东西的 Reader 计算。

pure 通常是流水线的打包机，它把你计算出的最终结果（一个普通值）重新放回 Reader “流水线”上，所谓“移除副作用”。

typealias @reader.Reader

// `pure` 创建一个不依赖环境的计算
let ?
pure_reader : Reader[String
String, Int
Int] = (Int) -> ?
Reader::pure(100)

test {
  // 无论环境是什么 (比如 "hello")，结果都是 100
  (a : Int, b : Int, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(?
pure_reader.(String) -> Int
run("hello"), 100)
}

Reader::bind

这是流水线的“连接器”。例如把“和面”这一步和“压面”这一步连接起来，并确保它们能连成一条“生产线”。

为什么需要它？ 为了自动化！ 。bind 让这个过程全自动，你只管定义好每个步骤，它负责传递。

fnalias () -> ?
@reader.ask

// 步骤1: 定义一个 Reader，它的工作是从环境（一个Int）中读取值
let ?
step1 : Reader[Int
Int, Int
Int] = () -> ?
ask()

// 步骤2: 定义一个函数，它接收一个数字，然后返回一个新的 Reader 计算
fn (n : Int) -> ?
step2_func(Int
n : Int
Int) -> Reader[Int
Int, Int
Int] {
  (Int) -> ?
Reader::pure(Int
n (self : Int, other : Int) -> Int
Multiplies two 32-bit integers. This is the implementation of the *
operator for Int.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns the product of the two integers. If the result overflows the range of
Int, it wraps around according to two's complement arithmetic.
Example:
  inspect(42 * 2, content="84")
  inspect(-10 * 3, content="-30")
  let max = 2147483647 // Int.max_value
  inspect(max * 2, content="-2") // Overflow wraps around
* 2)
}

// 使用 bind 将两个步骤连接起来
let ?
computation : Reader[Int
Int, Int
Int] = ?
step1.((Int) -> ?) -> ?
bind((n : Int) -> ?
step2_func)

test {
  // 运行整个计算，环境是 5
  // 流程: step1 从环境得到 5 -> bind 把 5 交给 step2_func -> step2_func 计算 5*2=10 -> pure(10)
  (a : Int, b : Int, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(?
computation.(Int) -> Int
run(5), 10)
}

Reader::map

就像是给午餐盒里的三明治换个标签。它只改变盒子里的东西（比如把薄荷塘换成酒心巧克力），但不动午餐盒本身。

很多时候我们只是想对结果做个简单转换，用 map 比用 bind 更直接，意图更清晰。

// `map` 只转换结果，不改变依赖
let ?
reader_int : Reader[Unit
Unit, Int
Int] = (Int) -> ?
Reader::pure(5)

let ?
reader_string : Reader[Unit
Unit, String
String] = ?
reader_int.((Unit) -> String) -> ?
map(Unit
n => "Value is \{Unit
n}")

test {
  (a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(?
reader_string.(Unit) -> String
run(()), "Value is 5")
}

ask

ask 就像是流水线上的一个工人，随时可以抬头看一眼挂在墙上的“生产配方”。这是我们真正读取环境的唯一手段。

bind 只负责在幕后传递，但当你想知道“配方”里到底写了什么时，就必须用 ask 把它“问”出来。

// `ask` 直接获取环境
let ?
ask_reader : Reader[String
String, String
String] = () -> ?
ask()

let String
result : String
String = ?
ask_reader.(String) -> String
run("This is the environment")

test {
  (a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(String
result, "This is the environment")
}

而我们接下来会经常用到的 asks，只是对 ask().map() 的封装。

DI 对比 Reader Monad

搞个经典例子：开发一个 UserService，它需要一个 Logger 来记录日志，还需要一个 Database 来获取数据。

普通的 DI 我这里用我第二喜欢的 TypeScript 举例：

interface Logger {
  info(message: string): void
}
interface Database {
  getUserById(id: number): { name: string } | undefined
}

// 业务类通过构造函数声明其依赖
class UserService {
  constructor(
    private logger: Logger,
    private db: Database
  ) {}

  getUserName(id: number): string | undefined {
    this.logger.info(`Querying user with id: ${id}`)
    const user = this.db.getUserById(id)
    return user?.name
  }
}

// 创建依赖实例并注入
const myLogger: Logger = { info: (msg) => console.log(`[LOG] ${msg}`) }
const myDb: Database = {
  getUserById: (id) => (id === 1 ? { name: 'MoonbitLang' } : undefined)
}

const userService = new UserService(myLogger, myDb)
const userName = userService.getUserName(1) // "MoonbitLang"

// 一般来说我们会用一些库管理注入，不会手动实例化。例如 InversifyJS 亦或者是……Angular

而 Reader Monad 呢

fnalias ((Unit) -> String) -> ?
@reader.asks

struct User {
  String
name : String
String
}

trait trait Logger {
  info(Self, String) -> Unit
}
Logger {
  (Self, String) -> Unit
info(type parameter Self
Self, String
String) -> Unit
Unit
}

trait trait Database {
  getUserById(Self, Int) -> User?
}
Database {
  (Self, Int) -> User?
getUserById(type parameter Self
Self, Int
Int) -> struct User {
  name: String
}
User?
}

struct AppConfig {
  &Logger
logger : &trait Logger {
  info(Self, String) -> Unit
}
Logger
  &Database
db : &trait Database {
  getUserById(Self, Int) -> User?
}
Database
}

fn (id : Int) -> ?
getUserName(Int
id : Int
Int) -> Reader[struct AppConfig {
  logger: &Logger
  db: &Database
}
AppConfig, String
String?] {
  ((Unit) -> String) -> ?
asks(Unit
config => {
    Unit
config.&Logger
logger.(&Logger, String) -> Unit
info("Querying user with id: \{Int
id}")
    let User?
user = Unit
config.&Database
db.(&Database, Int) -> User?
getUserById(Int
id)
    User?
user.(self : User?, f : (User) -> String) -> String?
Maps the value of an Option using a provided function.
Example
  let a = Some(5)
  assert_eq(a.map(x => x * 2), Some(10))

  let b = None
  assert_eq(b.map(x => x * 2), None)
map(User
obj => User
obj.String
name)
  })
}

struct LocalDB {}

impl trait Database {
  getUserById(Self, Int) -> User?
}
Database for struct LocalDB {
}
LocalDB with (LocalDB, id : Int) -> User?
getUserById(_, Int
id) {
  if Int
id (self : Int, other : Int) -> Bool
Compares two integers for equality.
Parameters:

self : The first integer to compare.
other : The second integer to compare.
Returns true if both integers have the same value, false otherwise.
Example:
  inspect(42 == 42, content="true")
  inspect(42 == -42, content="false")
== 1 {
    (User) -> User?
Some({ String
name: "MoonbitLang" })
  } else {
    User?
None
  }
}

struct LocalLogger {}

impl trait Logger {
  info(Self, String) -> Unit
}
Logger for struct LocalLogger {
}
LocalLogger with (LocalLogger, content : String) -> Unit
info(_, String
content) {
  (input : String) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("\{String
content}")
}

test "Test UserName" {
  let AppConfig
appConfig = struct AppConfig {
  logger: &Logger
  db: &Database
}
AppConfig::{ &Database
db: struct LocalDB {
}
LocalDB::{  }, &Logger
logger: struct LocalLogger {
}
LocalLogger::{  } }
  (a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq((id : Int) -> ?
getUserName(1).(AppConfig) -> Unit
run(AppConfig
appConfig).() -> String
unwrap(), "MoonbitLang")
}

可以发现，getUserName 函数同样不持有任何依赖，它只是一个“计算描述”。

这个特性让 Reader Monad 成为了实现六边形架构的天作之合。在六边形架构里，核心原则是 “依赖倒置” ——核心业务逻辑不应该依赖具体的基础设施。

getUserName 的例子就是最好的体现。AppConfig 就是一个 Ports 集合

而 getUserName 这个核心业务逻辑，它只依赖 AppConfig 这个抽象，完全不知道背后到底是 MySQL 还是 PostgreSQL，还是一个假实现：一个 Mock DB

但它不能解决什么问题？状态修改。

Reader Monad 的环境永远是“只读”的。一旦注入，它在整个计算过程中都不能被改变。

如果你需要一个可变的状态，找它的兄弟 State Monad 吧。

也就是说，它的好处很明显：它可以在任意地方读取配置；

当然它的坏处也很明显：它只会读取。

简单的 i18n 工具库

经常搞前端的人都知道，我们如果要搞 i18n，大概率会用上 i18next 这类库。它的核心玩法，通常是把一个 i18n 实例通过 React Context 注入到整个应用里，任何组件想用翻译，直接从 Context 里拿就行。所以这其实也可以是一种依赖注入。

回归初心了属于是，本来寻找 DI(Context) 的目的就是为了给 cli 工具支持 i18n。当然这里只是一个简单的演示。

首先，先安装依赖

moon add colmugx/reader

接着，我们来定义 i18n 库需要的环境和字典类型。

typealias String as Locale

typealias String as TranslationKey

typealias String as TranslationValue

typealias type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[String
TranslationKey, String
TranslationValue] as Translations

typealias type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[String
Locale, type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Translations] as Dict

struct I18nConfig {
  // 这里只是方便演示添加了 mut
  mut String
lang : String
Locale
  Map[String, Map[String, String]]
dict : type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Dict
}

接下来是翻译函数 t

fn (key : String) -> ?
t(String
key : String
TranslationKey) -> Reader[struct I18nConfig {
  mut lang: String
  dict: Map[String, Map[String, String]]
}
I18nConfig, String
TranslationValue] {
  ((Unit) -> String) -> ?
asks(Unit
config => Unit
config.Map[String, Map[String, String]]
dict
    .(self : Map[String, Map[String, String]], key : String) -> Map[String, String]?
Retrieves the value associated with a given key in the hash map.
Parameters:

self : The hash map to search in.
key : The key to look up in the map.
Returns Some(value) if the key exists in the map, None otherwise.
Example:
  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get(Unit
config.String
lang)
    .(self : Map[String, String]?, f : (Map[String, String]) -> String) -> String?
Maps the value of an Option using a provided function.
Example
  let a = Some(5)
  assert_eq(a.map(x => x * 2), Some(10))

  let b = None
  assert_eq(b.map(x => x * 2), None)
map(Map[String, String]
lang_map => Map[String, String]
lang_map.(self : Map[String, String], key : String) -> String?
Retrieves the value associated with a given key in the hash map.
Parameters:

self : The hash map to search in.
key : The key to look up in the map.
Returns Some(value) if the key exists in the map, None otherwise.
Example:
  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get(String
key).(self : String?, default : String) -> String
Return the contained Some value or the provided default.
unwrap_or(String
key))
    .(self : String?, default : String) -> String
Return the contained Some value or the provided default.
unwrap_or(String
key))
}

完事了，看起来很简单是不是

接下来，假设我们的 CLI 工具需要根据操作系统的 LANG 环境变量来显示不同语言的欢迎信息。

fn (content : String) -> ?
welcome_message(String
content : String
String) -> Reader[struct I18nConfig {
  mut lang: String
  dict: Map[String, Map[String, String]]
}
I18nConfig, String
String] {
  (key : String) -> ?
t("welcome").((Unit) -> Unit) -> ?
bind(Unit
welcome_text => (String) -> Unit
Reader::pure("\{Unit
welcome_text} \{String
content}"))
}

test {
  let Map[String, Map[String, String]]
dict : type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Dict = {
    "en_US": { "welcome": "Welcome To" },
    "zh_CN": { "welcome": "欢迎来到" },
  }

  // 假设你的系统语言 LANG 是 zh_CN
  let I18nConfig
app_config = struct I18nConfig {
  mut lang: String
  dict: Map[String, Map[String, String]]
}
I18nConfig::{ String
lang: "zh_CN", Map[String, Map[String, String]]
dict }
  let ?
msg = (content : String) -> ?
welcome_message("MoonbitLang")
  (a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(?
msg.(I18nConfig) -> String
run(I18nConfig
app_config), "欢迎来到 MoonbitLang")

  // 切换语言
  I18nConfig
app_config.String
lang = "en_US"
  (a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise
Asserts that two values are equal. If they are not equal, raises a failure
with a message containing the source location and the values being compared.
Parameters:

a : First value to compare.
b : Second value to compare.
loc : Source location information to include in failure messages. This is
usually automatically provided by the compiler.
Throws a Failure error if the values are not equal, with a message showing
the location of the failing assertion and the actual values that were
compared.
Example:
  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq(?
msg.(I18nConfig) -> String
run(I18nConfig
app_config), "Welcome To MoonbitLang")
}

欢迎来到 MoonbitLang

异步编程简史​

MoonBit 中的异步编程​

HTTP 服务器的骨架​

处理用户来自用户的请求​

Examples

Examples

实现将文件夹打包成 zip 的功能​

让服务器跑起来​

引言​

预先准备​

编译项目​

第一个 JavaScript API 调用​

JavaScript 类型的对接​

无需转换的 JavaScript 类型​

外部 JavaScript 类型​

处理 JavaScript 错误​

对接外部 JavaScript API​

结语​

约定与定义​

Brzozowski 导数方法​

Example

虚拟机方法​

指令集与程序表示​

Example

AST 到字节码的编译​

虚拟机执行循环​

Example

Example

Example

Example

Parameters

Panics

Example

基准测试与性能分析​

SimpleDoc 原语

Example

Example

Example

ExtendDoc：Nest, Choice, Group

计算所需空间​

实现 ExtendDoc::render​

Example

Example

Example

Example

Example

Example

组合函数

softline & softbreak​

autoline & autobreak​

sepby​

Example

surround​

打印Json

Type Parameters

Arguments

Returns

Type Parameters

Arguments

Returns

Examples

Examples

总结

介绍​

问题分析和解法​

如何在运行时构建依赖图​

如何标记过时的节点​

如何决定一个 thunk 需要被重新计算​

实现​

Example

Example

Example

Example

Example

Example

参考​

引言​

Python 解释器的工作原理​

优化 Python 性能的路径​

在 MoonBit 中使用已封装的 Python 库​

异步编程简史

MoonBit 中的异步编程

HTTP 服务器的骨架

处理用户来自用户的请求

实现将文件夹打包成 zip 的功能

让服务器跑起来

引言

预先准备

编译项目

第一个 JavaScript API 调用

JavaScript 类型的对接

无需转换的 JavaScript 类型

外部 JavaScript 类型

处理 JavaScript 错误

对接外部 JavaScript API

结语

约定与定义

Brzozowski 导数方法

虚拟机方法

指令集与程序表示

AST 到字节码的编译

虚拟机执行循环

基准测试与性能分析

计算所需空间

实现 ExtendDoc::render

softline & softbreak

autoline & autobreak

sepby

surround

介绍

问题分析和解法

如何在运行时构建依赖图

如何标记过时的节点

如何决定一个 thunk 需要被重新计算

实现

参考

引言

Python 解释器的工作原理

优化 Python 性能的路径

在 MoonBit 中使用已封装的 Python 库

在 MoonBit 中使用未封装的 Python 模块

引入 python.mbt

导入 Python 模块

MoonBit 与 Python 对象的相互转换

调用模块中的函数

实践建议

结语

引言

预先准备

基础准备 (The Groundwork)

编译到 Native

配置链接

第一次跨语言调用 (The First FFI Call)

跨越类型系统的鸿沟 (Navigating the Type System Chasm)

3.1 基本类型：(Basic Types)

3.2 字符串 (Strings)

3.3 指针的艺术：传递引用与数组 (The Art of Pointers: Passing by Reference and Arrays)

3.4 外部类型：拥抱不透明的 C 结构体 (External Types: Embracing Opaque C Structs)

3.5 函数指针：当 C 需要回调 MoonBit (Function Pointers: When C Needs to Call Back)

第四站：高级课题——GC管理(Advanced Topic: GC Management)

4.1 简单情况

4.2 复杂情况，使用析构函数（Finalizer） (The Complex Situation: Using Finalizers)

结语 (Conclusion)

引言

第一章：LLVM类型系统的Moonbit表示

Trait Object：类型的抽象表示

类型识别与转换

安全的类型转换策略

复合类型的构造

第二章：LLVM值系统与BasicValue概念

实际应用示例

值类型的完整分类

💡 值转换的最佳实践

第三章：LLVM IR生成实战

基础设施初始化

一个简单的函数生成示例

生成的LLVM IR

💡 代码生成最佳实践

第四章：TinyMoonbit编译器实现

类型映射：从Parser到LLVM