Introduction
This is a general overview of the compiler
Grammar
A program is composed of a list of items
Syntax
Token →
IDENT
| ALPHANUMERIC
| FLOAT_LITERAL
| STRING_LITERAL
| INTEGER_LITERAL
| FLOAT_LITERAL
STRING_LITERAL → " (
~[\ "]
| \ <any char>
)* (non-greedy) "
IDENT → ALPHABETIC ALPHANUMERIC*
ALPHANUMERIC → ALPHABETIC | NUMERIC
ALPHABETIC → [a-z A-Z _]
NUMERIC → [0-9]
FLOAT_LITERAL → NUMERIC+ . NUMERIC+
Paths
A path is a way of refering to named items.
Syntax
PathPrefix → ( self | super )? ::
Path → PathPrefix? IDENT ( :: IDENT )*
Example
mod utils {
fn sum(a: i32, b: i32) -> i32 {
a + b
}
}
fn foo() -> i32 { utils::sum(1, 2) }
let a: i32;
fn main() {
let a: i32 = 1; // Local variable
self::a = foo(); // Access the 'a' variable from the module
}
In the example above we have 4 paths
- Inside sum. Both
aandbare paths. utils::sumis a path composed of two segments:utilsandsumself::ais a path that refers to “the item a inside the current module”
Items
Syntax
Item →
FunctionDecl
| ModuleDecl
| VarDecl
| StructDecl
| UseItem
Use Items
use items allow to bring other items into scope.
Example
mod utils {
fn max(a: i32, b: i32) -> i32 {
if a > b { a } else { b }
}
fn abs(a: i32) -> i32 {
if a > 0 { a } else { a * -1 }
}
}
use utils::max;
use utils as very_util;
fn main() {
let a: i32 = max(1, 2);
let b: i32 = very_util::abs(-3);
}
Modules
A module is a named item that encloses a set items
Syntax
ModuleDecl →
mod IDENT { Item* }
| extern mod IDENT ;
Example
mod lib {
mod math {
fn sum(a: i32, b: i32) -> i32 { a + b }
}
mod math_u16 {
fn sum(a: u16, b: u16) -> u16 { a + b }
}
mod person {
struct Person {
age: u16,
name: &char,
}
fn is_old(p: &Person) -> bool {
(*p).age >= 70
}
}
}
Structs
Functions
Syntax
FunctionDecl →
fn IDENT ( ParamList? ) ( -> Type )? BlockExpr
| extern fn IDENT ( ParamList? ) ( -> Type )? ;
Variables
Expressions
Expressions are the main syntactical construction of the language. They represent an operation that yields a value.
Syntax
Expr →
BlockExpr
| ArithmeticExpr
| LogicalExpr
| CallExpr
| IfExpr
| Path
| LiteralExpr
| ReferenceExpr
| DereferenceExpr
| FieldAccess
| ArrayAccess
| TupleAccess
| ( Expr )
Block Expressions
A BlockExpr is a block that evaluates to an expression. Like any kind of block, it contains a list of statements, and an optional tail expression.
Literal Expressions
Syntax
LiteralExpr →
INTEGER_LITERAL
| FLOAT_LITERAL
| STRING_LITERAL
If Expressions
An if expression evaluates it’s conditional expression.
If it’s true, evaluates it’s then block
If it’s false, evaluates it’s else block
An if expression -unlike the if statement- MUST have both a then and an else blocks.
Operators
Arithmetic expressions
Syntax
ArithmeticExpr →
Expr [* / %] Expr
| Expr [+ -] Expr
Logical Expressions
Syntax
LogicalExpr →
Expr && Expr
| Expr || Expr
(De)Reference expressions
Syntax
ReferenceExpr →
& Expr
| && Expr
DereferenceExpr → * Expr
Access Expressions
Field Access
Syntax
FieldAccess → Expr . IDENT
Array Access
Syntax
ArrayAccess → Expr [ Expr ]
Tuple Access
Syntax
TupleAccess → Expr . INTEGER_LITERAL
Call Expressions
Statements
Syntax
Stmt → Item
IfStmt
| WhileStmt
| BlockStmt
| Expr ;
| break
| continue
| return
WhileStmt → while ( Expr ) Stmt
Types
Syntax
Type →
i32 | i16 | i32 | i64
| u8 | u16 | u32 | u64
| f32 | f64
| char
| bool
| Path
| [ Type ; INTEGER_LITERAL ]
| ( TypeList? )
| & Type
| && Type
Example
let a: i32;
struct Person {
age: u16,
name: &char,
}
let people: [Person, 12];
Abstract Syntax Tree
The AST is a syntactical representation of a program. It is the output of the parser.
This AST doesn’t have any semantic information.
For example, it doesn’t know the different types of binary operations
(arithmetic, logical, etc.). It just sees <OPERAND> <OPERATOR> <OPERAND>.
We try to presserve all the syntactic information of the program (semicolons, parenthesis, etc.), and we don’t make up things that aren’t there.
For example, see ItemKind::Function. Its return type is an Option,
beacuse this program is syntactically correct:
fn foo() { }
but this one is also correct:
fn foo() -> int { ... }
Another example:
a + ( (b) * 4 )
There are a LOT of unnecesary parenthesis here. It’s nice to have the AST preserve this information so we can, for example, check unnecesary parenthesis.
Later, we’ll lower this AST to the HIR, which will strip unnecesary syntactical information. It’ll also remove parenthesis, since their only job is to explicit the precedence of operations, and they become useless after building the AST.
HIR
The Hir (Higher Intermediate Representation) is the result of lowering the AST