Introduction

This is a general overview of the compiler

Grammar

A program is composed of a list of items

^Syntax
Program → Item^*

STRING_LITERAL → " (
        ~[\ "]
        | \ <any char>
    )^{* (non-greedy)} "

IDENT → ALPHABETIC ALPHANUMERIC^*

ALPHANUMERIC → ALPHABETIC | NUMERIC

ALPHABETIC → [a-z A-Z _]

NUMERIC → [0-9]

INTEGER_LITERAL → NUMERIC⁺

FLOAT_LITERAL → NUMERIC⁺ . NUMERIC⁺

Paths

A path is a way of refering to named items.

[path.syntax]

^Syntax
PathPrefix → ( self | super )^? ::

Path → PathPrefix^? IDENT ( :: IDENT )^*

Example

mod utils {
    fn sum(a: i32, b: i32) -> i32 {
        a + b
    }
}

fn foo() -> i32 { utils::sum(1, 2) }

let a: i32;
    
fn main() {
    let a: i32 = 1;     // Local variable
    self::a = foo();    // Access the 'a' variable from the module

}

In the example above we have 4 paths

Inside sum. Both a and b are paths.
utils::sum is a path composed of two segments: utils and sum
self::a is a path that refers to “the item a inside the current module”

Items

[item.syntax]

^Syntax
Item →
      FunctionDecl
    | ModuleDecl
    | VarDecl
    | StructDecl
    | UseItem

Use Items

use items allow to bring other items into scope.

[item.use.syntax]

^Syntax
UseItem →
    use Type as IDENT ;
    | use Path ( as IDENT )^? ;
    | use Path :: * ;

Example

mod utils {
    fn max(a: i32, b: i32) -> i32 {
        if a > b { a } else { b }
    }

    fn abs(a: i32) -> i32 {
        if a > 0 { a } else { a * -1 }
    }
}

use utils::max;
use utils as very_util;

fn main() {
    let a: i32 = max(1, 2);

    let b: i32 = very_util::abs(-3);
}

Modules

A module is a named item that encloses a set items

[item.module.syntax]

^Syntax
ModuleDecl →
mod IDENT { Item^* }
| extern mod IDENT ;

Example

mod lib {

  mod math {
      fn sum(a: i32, b: i32) -> i32 { a + b }
  }

  mod math_u16 {
    fn sum(a: u16, b: u16) -> u16 { a + b }
  }

  mod person {
      struct Person {
          age: u16,
          name: &char,
      }

      fn is_old(p: &Person) -> bool {
          (*p).age >= 70
      }
  }

}

Structs

[item.struct.syntax]

^Syntax
StructDecl → struct IDENT { Field^* }

Field → IDENT : Type

Functions

[item.function.syntax]

^Syntax
FunctionDecl →
fn IDENT ( ParamList^? ) ( -> Type )^? BlockExpr
| extern fn IDENT ( ParamList^? ) ( -> Type )^? ;

Parameter → IDENT : Type

ParamList → Parameter ( , Parameter )^*

Variables

[item.let.syntax]

^Syntax
VarDecl →
let ( : Type )^? IDENT ( = Expr )^? ;
| const : Type IDENT = Expr ;

[expr]

Expressions

Expressions are the main syntactical construction of the language. They represent an operation that yields a value.

[expr.syntax]

Block Expressions

[expr.block.syntax]

^Syntax
BlockExpr → { Stmt^* Expr^? }

A BlockExpr is a block that evaluates to an expression. Like any kind of block, it contains a list of statements, and an optional tail expression.

Literal Expressions

[expr.literal.syntax]

^Syntax
LiteralExpr →
    INTEGER_LITERAL
    | FLOAT_LITERAL
    | STRING_LITERAL

If Expressions

[expr.if.syntax]

^Syntax
IfExpr → if Expr BlockExpr else BlockExpr

An if expression evaluates it’s conditional expression. If it’s true, evaluates it’s then block If it’s false, evaluates it’s else block

An if expression -unlike the if statement- MUST have both a then and an else blocks.

Operators

Arithmetic expressions

[expr.arithmetic.syntax]

^Syntax
ArithmeticExpr →
Expr [* / %] Expr
| Expr [+ -] Expr

Logical Expressions

[expr.logical.syntax]

^Syntax
LogicalExpr →
Expr && Expr
| Expr || Expr

(De)Reference expressions

[expr.ref.syntax]

^Syntax
ReferenceExpr →
& Expr
| && Expr

DereferenceExpr → * Expr

[expr.access]

Access Expressions

Field Access

[expr.access.field.syntax]

^Syntax
FieldAccess → Expr . IDENT

Array Access

[expr.access.array.syntax]

^Syntax
ArrayAccess → Expr [ Expr ]

Tuple Access

[expr.access.tuple.syntax]

^Syntax
TupleAccess → Expr . INTEGER_LITERAL

Call Expressions

[expr.call.syntax]

^Syntax
CallExpr → Expr ( Args^? )

Args → Expr ( , Expr )^*

Statements

[stmt.syntax]

WhileStmt → while ( Expr ) Stmt

IfStmt → if Expr BlockStmt ( else BlockStmt )^?

BlockStmt → { Stmt^* }

Types

[type.syntax]

^Syntax
Type →
      i32 | i16 | i32 | i64
    | u8 | u16 | u32 | u64
    | f32 | f64
    | char
    | bool
    | Path
    | [ Type ; INTEGER_LITERAL ]
    | ( TypeList^? )
    | & Type
    | && Type

TypeList → Type ( , Type )^*

Example

let a: i32;

struct Person {
    age: u16,
    name: &char,
}

let people: [Person, 12];

Abstract Syntax Tree

The AST is a syntactical representation of a program. It is the output of the parser.

This AST doesn’t have any semantic information. For example, it doesn’t know the different types of binary operations (arithmetic, logical, etc.). It just sees <OPERAND> <OPERATOR> <OPERAND>.

We try to presserve all the syntactic information of the program (semicolons, parenthesis, etc.), and we don’t make up things that aren’t there.

For example, see ItemKind::Function. Its return type is an Option, beacuse this program is syntactically correct:

fn foo() { }

but this one is also correct:

fn foo() -> int { ... }

Another example:

a + ( (b) * 4 )

There are a LOT of unnecesary parenthesis here. It’s nice to have the AST preserve this information so we can, for example, check unnecesary parenthesis.

Later, we’ll lower this AST to the HIR, which will strip unnecesary syntactical information. It’ll also remove parenthesis, since their only job is to explicit the precedence of operations, and they become useless after building the AST.

HIR

The Hir (Higher Intermediate Representation) is the result of lowering the AST

Keyboard shortcuts

Compiler Documentation