在MoonBit中实现IntMap

键值对容器是现代编程语言必备的标准库成员之一,它应用广泛,所以其基本操作的性能非常重要。函数式语言的键值对容器实现大多基于某种平衡二叉搜索树,这样实现的键值对容器在查找和插入操作上表现优秀,但在需要合并两个键值对容器时表现不佳,命令式语言常用的哈希表也不擅长合并操作。
IntMap是一种为整数特化的不可变键值对容器,它只能以整数为键,通过牺牲一定的通用性,它实现了高效的合并/取交集操作。本文将从最朴素的二叉字典树开始,逐步改进到IntMap.
二叉字典树
二叉字典树是一种使用每个键的二进制表示决定其位置的二叉树,键的二进制表示是一长串有限的0和1,那么假如当前位是0,就向左子树递归,当前位为1则向右子树递归.
///|
enum BinTrie[T] {
BinTrie[T]
Empty
(T) -> BinTrie[T]
Leaf(type parameter T
T)
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch(BinTrie[T]
left~ : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T], BinTrie[T]
right~ : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T])
}
要在二叉字典树里查找某个键对应的值,只需要依次读取键的二进制位,根据其值选择向左或者向右移动,直到到达某个叶子节点
此处读取二进制位的顺序是从整数最小位到最高位
fn[T] enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie::fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup(BinTrie[T]
self : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T], UInt
key : UInt
UInt) -> type parameter T
T? {
match BinTrie[T]
self {
BinTrie[T]
Empty => T?
None
(T) -> BinTrie[T]
Leaf(T
value) => (T) -> T?
Some(T
value)
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch(BinTrie[T]
left~, BinTrie[T]
right~) =>
if UInt
key fn Mod::mod(self : UInt, other : UInt) -> UInt
Calculates the remainder of dividing one unsigned integer by another.
Parameters:
self : The unsigned integer dividend.
other : The unsigned integer divisor.
Returns the remainder of the division operation.
Throws a panic if other is zero.
Example:
let a = 17U
let b = 5U
inspect(a % b, content="2") // 17 divided by 5 gives quotient 3 and remainder 2
inspect(7U % 4U, content="3")
% 2U fn Eq::equal(self : UInt, other : UInt) -> Bool
Compares two unsigned 32-bit integers for equality.
Parameters:
self : The first unsigned integer operand.
other : The second unsigned integer operand to compare with.
Returns true if both integers have the same value, false otherwise.
Example:
let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
== 0 {
BinTrie[T]
left.fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup(UInt
key fn Div::div(self : UInt, other : UInt) -> UInt
Performs division between two unsigned 32-bit integers. The operation follows
standard unsigned integer division rules, where the result is truncated
towards zero.
Parameters:
self : The dividend (the number to be divided).
other : The divisor (the number to divide by).
Returns an unsigned 32-bit integer representing the quotient of the division.
Example:
let a = 42U
let b = 5U
inspect(a / b, content="8") // Using infix operator
/ 2)
} else {
BinTrie[T]
right.fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup(UInt
key fn Div::div(self : UInt, other : UInt) -> UInt
Performs division between two unsigned 32-bit integers. The operation follows
standard unsigned integer division rules, where the result is truncated
towards zero.
Parameters:
self : The dividend (the number to be divided).
other : The divisor (the number to divide by).
Returns an unsigned 32-bit integer representing the quotient of the division.
Example:
let a = 42U
let b = 5U
inspect(a / b, content="8") // Using infix operator
/ 2)
}
}
}
为了避免创建过多空树,我们不直接调用值构造子,而是使用branch方法
fn[T] enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie::fn[T] BinTrie::br(left : BinTrie[T], right : BinTrie[T]) -> BinTrie[T]
br(BinTrie[T]
left : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T], BinTrie[T]
right : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T]) -> enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T] {
match (BinTrie[T]
left, BinTrie[T]
right) {
(BinTrie[T]
Empty, BinTrie[T]
Empty) => BinTrie[T]
Empty
_ => (left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch(BinTrie[T]
left~, BinTrie[T]
right~)
}
}
Patricia Tree
Patricia Tree在二叉字典树的基础上保存了更多信息以加速查找,在每个分叉的地方,它都保留子树中所有键的公共前缀(虽然此处是从最低位开始计算,但我们仍然使用前缀这种说法),并用一个无符号整数标记当前的分支位(branching bit).这样一来,查找时需要经过的分支数量大大减少。
///|
enum PatriciaTree[T] {
PatriciaTree[T]
Empty
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key~ : Int
Int, T
value~ : type parameter T
T)
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(
UInt
prefix~ : UInt
UInt,
UInt
mask~ : UInt
UInt,
PatriciaTree[T]
left~ : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
PatriciaTree[T]
right~ : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T]
)
}
///|
fn[T] enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree::fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup(PatriciaTree[T]
self : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T], Int
key : Int
Int) -> type parameter T
T? {
match PatriciaTree[T]
self {
PatriciaTree[T]
Empty => T?
None
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key=Int
k, T
value~) => if Int
k fn Eq::equal(self : Int, other : Int) -> Bool
Compares two integers for equality.
Parameters:
self : The first integer to compare.
other : The second integer to compare.
Returns true if both integers have the same value, false otherwise.
Example:
inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
== Int
key { (T) -> T?
Some(T
value) } else { T?
None }
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix~, UInt
mask~, PatriciaTree[T]
left~, PatriciaTree[T]
right~) =>
if Bool
!fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefixBool
(UInt
keyBool
=Int
keyBool
.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uintBool
(), UInt
prefixBool
~, UInt
maskBool
~) {
T?
None
} else if fn zero(k : UInt, mask~ : UInt) -> Bool
zero(Int
key.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uint(), UInt
mask~) {
PatriciaTree[T]
left.fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup(Int
key)
} else {
PatriciaTree[T]
right.fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup(Int
key)
}
}
}
///|
fn fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix(UInt
key : UInt
UInt, UInt
mask~ : UInt
UInt) -> UInt
UInt {
UInt
key fn BitAnd::land(self : UInt, other : UInt) -> UInt
Performs a bitwise AND operation between two unsigned 32-bit integers. For
each bit position, the result is 1 if the bits at that position in both
operands are 1, and 0 otherwise.
Parameters:
self : The first unsigned 32-bit integer operand.
other : The second unsigned 32-bit integer operand.
Returns an unsigned 32-bit integer representing the result of the bitwise AND
operation.
Example:
let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
& (UInt
mask fn Sub::sub(self : UInt, other : UInt) -> UInt
Performs subtraction between two unsigned 32-bit integers. When the result
would be negative, the function wraps around using modular arithmetic (2^32).
Parameters:
self : The first unsigned 32-bit integer (minuend).
other : The second unsigned 32-bit integer to subtract from the first
(subtrahend).
Returns a new unsigned 32-bit integer representing the difference between the
two numbers. If the result would be negative, it wraps around to a positive
number by adding 2^32 repeatedly until the result is in range.
Example:
let a = 5U
let b = 3U
inspect(a - b, content="2")
let c = 3U
let d = 5U
inspect(c - d, content="4294967294") // wraps around to 2^32 - 2
- 1U)
}
///|
fn fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix(UInt
key~ : UInt
UInt, UInt
prefix~ : UInt
UInt, UInt
mask~ : UInt
UInt) -> Bool
Bool {
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix(UInt
key, UInt
mask~) fn Eq::equal(self : UInt, other : UInt) -> Bool
Compares two unsigned 32-bit integers for equality.
Parameters:
self : The first unsigned integer operand.
other : The second unsigned integer operand to compare with.
Returns true if both integers have the same value, false otherwise.
Example:
let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
== UInt
prefix
}
///|
fn fn zero(k : UInt, mask~ : UInt) -> Bool
zero(UInt
k : UInt
UInt, UInt
mask~ : UInt
UInt) -> Bool
Bool {
(UInt
k fn BitAnd::land(self : UInt, other : UInt) -> UInt
Performs a bitwise AND operation between two unsigned 32-bit integers. For
each bit position, the result is 1 if the bits at that position in both
operands are 1, and 0 otherwise.
Parameters:
self : The first unsigned 32-bit integer operand.
other : The second unsigned 32-bit integer operand.
Returns an unsigned 32-bit integer representing the result of the bitwise AND
operation.
Example:
let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
& UInt
mask) fn Eq::equal(self : UInt, other : UInt) -> Bool
Compares two unsigned 32-bit integers for equality.
Parameters:
self : The first unsigned integer operand.
other : The second unsigned integer operand to compare with.
Returns true if both integers have the same value, false otherwise.
Example:
let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
== 0
}
现在branch方法可以做更多优化, 保证Branch节点的子树不含Empty.
///|
fn[T] enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree::fn[T] PatriciaTree::branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
branch(
UInt
prefix~ : UInt
UInt,
UInt
mask~ : UInt
UInt,
PatriciaTree[T]
left~ : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
PatriciaTree[T]
right~ : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
) -> enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T] {
match (PatriciaTree[T]
left, PatriciaTree[T]
right) {
(PatriciaTree[T]
Empty, PatriciaTree[T]
right) => PatriciaTree[T]
right
(PatriciaTree[T]
left, PatriciaTree[T]
Empty) => PatriciaTree[T]
left
_ => (prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix~, UInt
mask~, PatriciaTree[T]
left~, PatriciaTree[T]
right~)
}
}
插入与合并
既然类型定义已经确定,接下来要做的就是实现插入和合并操作。由于插入操作也可以看作将一个只有一个叶节点的树与原本的树合并,所以我们优先介绍合并操作的实现。
我们首先讨论可以走捷径的情况:假设我们有两个非空树t0和t1,它们的最长公共前缀分别为p0和p1且p0和p1互不包含, 那么不管t0和t1有多大,合并它们的成本都是一样的,因为只需要创建一个新的Branch节点。我们通过辅助函数join来实现。
生成掩码的函数gen_mask利用了整数二进制补码的一个特性来寻找最低的分支位。
假设输入x的二进制表示为
00100100000
那么,x.lnot()得到
11011011111
加一得到
11011100000