Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

This is a general overview of the compiler

Grammar

A program is composed of a list of items

Syntax
ProgramItem*

Syntax
Token
    IDENT
    | ALPHANUMERIC
    | FLOAT_LITERAL
    | STRING_LITERAL
    | INTEGER_LITERAL
    | FLOAT_LITERAL

STRING_LITERAL" (
        ~[\ "]
        | \ <any char>
    )* (non-greedy) "

IDENTALPHABETIC ALPHANUMERIC*

ALPHANUMERICALPHABETIC | NUMERIC

ALPHABETIC → [a-z A-Z _]

NUMERIC → [0-9]

INTEGER_LITERALNUMERIC+

FLOAT_LITERALNUMERIC+ . NUMERIC+

Paths

A path is a way of refering to named items.

Syntax
PathPrefix → ( self | super )? ::

PathPathPrefix? IDENT ( :: IDENT )*

Example

mod utils {
    fn sum(a: i32, b: i32) -> i32 {
        a + b
    }
}

fn foo() -> i32 { utils::sum(1, 2) }

let a: i32;
    
fn main() {
    let a: i32 = 1;     // Local variable
    self::a = foo();    // Access the 'a' variable from the module

}

In the example above we have 4 paths

  • Inside sum. Both a and b are paths.
  • utils::sum is a path composed of two segments: utils and sum
  • self::a is a path that refers to “the item a inside the current module”

Items

Syntax
Item
      FunctionDecl
    | ModuleDecl
    | VarDecl
    | StructDecl
    | UseItem

Use Items

use items allow to bring other items into scope.

Syntax
UseItem
    use Type as IDENT ;
    | use Path ( as IDENT )? ;
    | use Path :: * ;

Example

mod utils {
    fn max(a: i32, b: i32) -> i32 {
        if a > b { a } else { b }
    }

    fn abs(a: i32) -> i32 {
        if a > 0 { a } else { a * -1 }
    }
}

use utils::max;
use utils as very_util;

fn main() {
    let a: i32 = max(1, 2);

    let b: i32 = very_util::abs(-3);
}

Modules

A module is a named item that encloses a set items

Syntax
ModuleDecl
      mod IDENT { Item* }
    | extern mod IDENT ;

Example

mod lib {

  mod math {
      fn sum(a: i32, b: i32) -> i32 { a + b }
  }

  mod math_u16 {
    fn sum(a: u16, b: u16) -> u16 { a + b }
  }

  mod person {
      struct Person {
          age: u16,
          name: &char,
      }

      fn is_old(p: &Person) -> bool {
          (*p).age >= 70
      }
  }

}

Structs

Syntax
StructDeclstruct IDENT { Field* }

FieldIDENT : Type

Functions

Syntax
FunctionDecl
      fn IDENT ( ParamList? ) ( -> Type )? BlockExpr
    | extern fn IDENT ( ParamList? ) ( -> Type )? ;

ParameterIDENT : Type

ParamListParameter ( , Parameter )*

Variables

Syntax
VarDecl
      let ( : Type )? IDENT ( = Expr )? ;
    | const : Type IDENT = Expr ;

Expressions

Expressions are the main syntactical construction of the language. They represent an operation that yields a value.

Syntax
Expr
      BlockExpr
    | ArithmeticExpr
    | LogicalExpr
    | CallExpr
    | IfExpr
    | Path
    | LiteralExpr
    | ReferenceExpr
    | DereferenceExpr
    | FieldAccess
    | ArrayAccess
    | TupleAccess
    | ( Expr )

Block Expressions

Syntax
BlockExpr{ Stmt* Expr? }

A BlockExpr is a block that evaluates to an expression. Like any kind of block, it contains a list of statements, and an optional tail expression.

Literal Expressions

Syntax
LiteralExpr
    INTEGER_LITERAL
    | FLOAT_LITERAL
    | STRING_LITERAL

If Expressions

Syntax
IfExprif Expr BlockExpr else BlockExpr

An if expression evaluates it’s conditional expression. If it’s true, evaluates it’s then block If it’s false, evaluates it’s else block

An if expression -unlike the if statement- MUST have both a then and an else blocks.

Operators

Arithmetic expressions

Syntax
ArithmeticExpr
      Expr [* / %] Expr
    | Expr [+ -] Expr

Logical Expressions

Syntax
LogicalExpr
      Expr && Expr
    | Expr || Expr

(De)Reference expressions

Syntax
ReferenceExpr
      & Expr
    | && Expr

DereferenceExpr* Expr

Access Expressions

Field Access

Syntax
FieldAccessExpr . IDENT

Array Access

Syntax
ArrayAccessExpr [ Expr ]

Tuple Access

Syntax
TupleAccessExpr . INTEGER_LITERAL

Call Expressions

Syntax
CallExprExpr ( Args? )

ArgsExpr ( , Expr )*

Statements

Syntax
StmtItem
      IfStmt
    | WhileStmt
    | BlockStmt
    | Expr ;
    | break
    | continue
    | return

WhileStmtwhile ( Expr ) Stmt

IfStmtif Expr BlockStmt ( else BlockStmt )?

BlockStmt{ Stmt* }

Types

Syntax
Type
      i32 | i16 | i32 | i64
    | u8 | u16 | u32 | u64
    | f32 | f64
    | char
    | bool
    | Path
    | [ Type ; INTEGER_LITERAL ]
    | ( TypeList? )
    | & Type
    | && Type

TypeListType ( , Type )*

Type i32 i16 i32 i64 u8 u16 u32 u64 f32 f64 char bool Path [ Type ; INTEGER_LITERAL ] ( TypeList ) & Type && Type

Example

let a: i32;

struct Person {
    age: u16,
    name: &char,
}

let people: [Person, 12];

Abstract Syntax Tree

The AST is a syntactical representation of a program. It is the output of the parser.

This AST doesn’t have any semantic information. For example, it doesn’t know the different types of binary operations (arithmetic, logical, etc.). It just sees <OPERAND> <OPERATOR> <OPERAND>.

We try to presserve all the syntactic information of the program (semicolons, parenthesis, etc.), and we don’t make up things that aren’t there.

For example, see ItemKind::Function. Its return type is an Option, beacuse this program is syntactically correct:

fn foo() { }

but this one is also correct:

fn foo() -> int { ... }

Another example:

a + ( (b) * 4 )

There are a LOT of unnecesary parenthesis here. It’s nice to have the AST preserve this information so we can, for example, check unnecesary parenthesis.

Later, we’ll lower this AST to the HIR, which will strip unnecesary syntactical information. It’ll also remove parenthesis, since their only job is to explicit the precedence of operations, and they become useless after building the AST.

HIR

The Hir (Higher Intermediate Representation) is the result of lowering the AST