‹ Articles

Guide on Rust

Sep 01, 2024

You may also enjoy Two Years of Rust.

Foreward.

This is a study guide aimed at existing SWE’s who are familiar with systems programming at picking up Rust in a pragmatic manner. I wanted to write a reference guide which had a complete explanation of the different facets of this learning journey.

In terms of picking up Rust as an already seasoned SWE, this is what I’ve found.

The most important challenge is understanding the semantics of ownership and the borrow checker. Rust as a language is unique in that data is immutable by default and memory safe by default, even in concurrency. The reason why is that it enforces compile-time rules around the usage of data. Data is owned by a single owner who can read-write. It can be borrowed, whereby references are created, where there is an exclusive set of readers or a single reader-writer. The ownership is governed by scope - when a value becomes out-of-scope it is unallocated - which is called being “dropped”. Likewise, if a value enters a new scope, it is said to be “moved”, where its ownership is transferred to that scope. Again, a value can only have a single owner.

The problems that these rules mitigate are:

  • always valid references to memory - no use-after-free, no dangling pointers, since references to values never outlive the value itself (ie. the memory deallocation of the value always occurs after the lifetime of the reference is over).
  • double free - Rust deallocates for you using Drop.
  • data races - Rust ensures only a single owner of data. It guarantees that accessing values from other threads (concurrently) adheres to rules specified in the Sync and Send traits - namely, a value can be transferred between different threads and that it can be referenced from different threads.
  • memory leaks

It turns out that race conditions - are actually a just a subclass of memory safety problems.

Once you have mastered writing code for the borrow checker, which isn’t that hard, usually just requires some behavioural modifications and also learning what the error messages mean:

  • Awareness of when values are move ’d and when to create references vs. simply transferring ownership.
  • Solving single ownership:
    • Copying data using clone
    • Creating references to data using &
    • Creating references to data using Arc<T> or Rc<T>
  • Dereferencing (* normally and & in closures) in iterator methods and other places.
  • Working with data as immutable from the start. Which typically changes the direction of state in your program → state flows downwards, rather than being written to in arbitrary locations.

Then the next major step is learning how to use traits, how to implement traits, and the common ones that can be #derive ’d or otherwise implemented (From, innertypes).

Result<T,E> and Option<T> won’t take more than 30mins to learn, and they are fairly simple.

Outside of this, Rust has a very advanced type system and standard library.

  • Traits
  • Algebraic data types - generics and trait bounds
  • Result<T,E> and Option
  • Enums and match syntax
  • Expressions as values - no return needed, first-class expressions in variable assignments.
  • Iterator
  • Vec and the functional methods .map, .fold, .reduce
  • Concurrency primitives
    • Mutex
    • RwLock
    • Arc/Rc
    • Channels
    • Threads
  • Dynamic dispatch using Box

As for the Rust tooling:

  • Very performant language
    • Zero cost abstractions
    • No garbage collection - statically allocated and deallocated memory
  • Extremely helpful compiler
    • Type inferencing everywhere
    • Very helpful error messages with suggestions
  • Standard package manager.
    • Install crates easily from github or package registry.
    • Specify custom toolchain.
  • Standard doc pages. docs.rs
  • Standard testing framework built into the language.
  • Pluggable async runtime.
  • Incredible serialization library serde
  • Expressive macro system (more similar to Lisp than anything).

I’ve ignored one significant aspect of the Rust language which is reference lifetimes. Lifetimes are an aspect of the language which is more implicit than explicit - they are generally inferred. Nonetheless being aware of lifetimes is important, they are bundled in with the referencing and ownership section.

Memory model - values, ownership, references.

To summarise the Rust semantics:

  • A value has a single owner, which generally means:
    • Only the owner can read it
    • Only the owner can mutate it
  • Ownership can be transferred:
    • a value is initially owned by a variable
    • then assigning this variable to another variable transfers ownership. this is called a “move”
    • moves are generally speaking - implicit. you don’t need to annotate them.
    • passing a variable into a function transfers the ownership to that function. this is a move.
  • When the ownership goes out of scope, the value is dropped (unallocated)

Generally speaking though, you want to share data. You want multiple people to be able to read a piece of data:

  • A value can be owned by a single owner, but you can create references.
  • References ensure:
    • Lifetime validity. The value you access will always be there. In C with pointers, there is a chance the value has been freed (unallocated) somewhere else. in Rust, the value is only unallocated (dropped) when the owner goes out of scope.
      • “Unlike a pointer, a reference is guaranteed to point to a valid value of a particular type for the life of that reference”
    • Concurrent-safe. The value follows strict synchronisation rules.
      • There are two types of references: mutable and read-only references.
      • As a user of references:
        • if you have a mutable reference, you are the only one who can modify and read the data.
        • you can create as many read-only references as you want, but they cannot exist at the same time as a mutable reference.
      • The rules are:
        • For any given value you can have either:
          • one mutable (write) reference to the value.
          • many immutable (read) references to the value.
  • How do references ensure this?
    • The property. In Rust, if a value is dropped (i.e., goes out of scope) while there is still an active reference to it, the program will fail to compile. Rust’s borrow checker ensures that references (both mutable and immutable) are only valid as long as the data they point to is still alive
    • The mechanism: when a reference is created, it lives for a lifetime. rust’s compiler ensures that the lifetime of the reference < the lifetime of the value. It automatically infers the reference lifetime using the following rules for functions:
      • references are tracked on the stack and dropped when a function is done, unless the function returns references
      • if the function returns references:
        • the lifetime of the returned reference can generally be assumed to be at least the lifetime of the input references
        • if there is one input, then the output references all share the input reference’s lifetime
        • if there is an input which is a reference to a self, then the output reference’s are that of the self reference. since self is a wider scope.
    • generally speaking, lifetimes exist so to manage the stack memory of a rust program and references exist to manage the heap memory of a program. by optimizing the lifetimes, a rust program can reduce stack memory usage, and by using references, a rust program can optimize heap memory usage.

Generally speaking, Rust builds up these simple primitives to allow for very strong guarantees around memory and thread safety:

  • Values - ie. 5
  • Ownership - ie. let x = 5
    • Allocation.
    • Deallocation - drop when ownership goes out of scope.
    • Safety - only the owner can read or write. There can only be 1 owner.
  • References - ie. let y = &x; let y2 = &x; OR let z = &mut x;
    • Memory sharing - rather than copying values, you can create read-only references to them so other parts of code can use it.
    • Safety - while you have a mutable reference, there can be no other readers of the value. otherwise, you can create as many reader references as possible.

Ownership.

  • Each value in Rust has an owner.
  • There can only be one owner at a time.
  • When the owner goes out of scope, the value will be dropped.

Memory:

  • we said that when a variable goes out of scope, Rust automatically calls the drop function and cleans up the heap memory for that variable
  • Rust will never automatically create “deep” copies of your data

Two differences:

// value of x is copied onto y, since x is primitive type, allocated on stack.
// specifically, x implements Copy
let x = 5;
let y = x;

// value of s1 is not copied onto s2, as String is a heap type
// specifically, String does NOT implement Copy
// ownership of the heap memory is transferred (moved) to s2, 
// and s1 is no longer valid
let s1 = String::from("hello");
let s2 = s1;

Semantics:

  • Passing a variable to a function will move or copy, just as assignment does
  • Rust won’t let us annotate a type with Copy if the type, or any of its parts, has implemented the Drop trait. If the type needs something special to happen when the value goes out of scope and we add the Copy annotation to that type, we’ll get a compile-time error

References (borrowing).

let mut s = String::from("hello");

let r1 = &mut s;
let r2 = &mut s;

println!("{}, {}", r1, r2);
// cannot borrow `s` as mutable more than once at a time

{
    let r1 = &mut s;
} // r1 goes out of scope here, so we can make a new reference with no problems.

let r2 = &mut s;
  • For any given value you can have either:
    • one mutable (write) reference to the value.
    • many immutable (read) references to the value.

Lifetimes.

Let’s talk about lifetimes. Lifetimes feature alongside references, and are often inferenced by the Rust compiler. However where Rust cannot statically determine the lifetime of a reference at compile-time, you must explicitly annotate it. This often happens when you are writing functions which return references to their inputs, but rustc is unable to infer which of the inputs are being returned since it is computed by the function at runtime.

So - we need to annotate lifetimes sometimes. So this section covers the syntax of lifetimes and the rules that the Rust compiler follows.

Lifetimes - inferred and explicit cases.

Lifetimes are defined by a syntax of an apostrophe followed by a lifetime identifier.

struct Test {}          // A type stored on the heap.
let x = Test{};         // The value Test{}, owned by `x`.
let y: &'a Test = &x;   // `y`, a reference to `x`, with the lifetime `a`.
let z = &x;             // `z`, a reference to `x`, with an implicit lifetime.

In the above example, you can see the lifetime automatically inferenced by the Rust compiler. In the next example, we’ll show where it cannot be inferenced.

#[derive(Debug)]
struct Point(i32, i32);

fn left_most(p1: &Point, p2: &Point) -> &Point {
    if p1.0 < p2.0 {
        p1
    } else {
        p2
    }
}

The lifetime is usually named 'a and follows very short naming conventions.

There exists one global lifetime called 'static , meaning the value is embedded in the .data section of the binary, and thus lives the entire length of the program.

Lifetimes - inference rules.

These are the rules of lifetimes:

  1. Each parameter with a reference gets its own distinct lifetime.
  2. If there is exactly one input reference, that lifetime is assigned to all output references.
  3. If there are multiple input references, but one of them is &self or &mut self, the lifetime of self is assigned to all output references.

Rule 1: Each parameter with a reference gets its own distinct lifetime.

fn foo(x: &i32, y: &i32) { }
// is inferred as ->
fn foo<'a, 'b>(x: &'a i32, y: &'b i32) { }

Rule 2: If there is exactly one input reference, that lifetime is assigned to all output references.

fn foo(x: &i32) -> &i32 { x }
// ->
fn foo<'a>(x: &'a i32) -> &'a i32 { x }

Rule 3: If there are multiple input references, but one of them is &self or &mut self, the lifetime of self is assigned to all output references.

Expressed logically:

  • if 1 < N(input_references) && input_references.filter(ref ⇒ ref.is_self)
    • ouput_references.map(ref.lifetime = self.lifetime)
  • if 1 == N(input_references):
    • ouput_references.map(ref.lifetime = input_references[0].lifetime)

Lifetimes - why the Rust compiler can infer this.

Importantly:

There is some magic why the rust compiler can assume this:

  • A function cannot return new references:
    • A function cannot return a reference to a value it created, since the value is dropped when the function goes out of scope (returns)
  • A function can return a reference to a value that is owned outside the function, such as a function parameter, a global variable, or something in the heap as long as the reference’s lifetime guarantees that the data will remain valid when the function returns
    • Function parameter
    • Global variable
    • Heap variable -
      • you must return an owned type (like Box<i32>) rather than a reference, or manage the reference’s lifetime explicitly.
// A) FUNCTION PARAMETER.

fn foo(x: &i32) -> &i32 { x }

// B) GLOBAL VARIABLE.
// 

static GLOBAL: i32 = 100;

fn foo() -> &'static i32 {
    &GLOBAL  // Safe because GLOBAL has a 'static lifetime
}

See some misconceptions about lifetimes.

Syntax.

Enums

pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

fn example() {
	// Construct error.
	// 1. One value of the enum (std::result::Result::Ok)
	// 2. Declare type - T=&str, E=Error
	let s: Result<&str,Error> = std::result::Result::Ok("cool");

    // Match statement.
    match s {
        // Decompose Ok's inner value
        Ok(s) => println!("cool: {}", s),
        Err(err) => println!("not cool {}", err.to_string()),
    }
}

Slices.

A slice is a kind of reference.

You can use the Python/Go style slice syntax.

ie. start..end

And omit the start

ie. ..end

let s = String::from("hello world");

let hello = &s[0..5];
let world = &s[6..11];

Errors/Result.

Result<T,E> is a common pattern in Rust.

Using Result to panic:

// .unwrap() to panic without message.
fn main() {
    let greeting_file = File::open("hello.txt")
        .unwrap()
}

// .expect(msg) to panic with message.
fn main() {
    let greeting_file = File::open("hello.txt")
        .expect("hello.txt should be included in this project");
}

Matching on Result:

let greeting_file = match greeting_file_result {
    Ok(file) => file,
    Err(error) => panic!("Problem creating the file: {error:?}")
};

Matching on different kinds of errors:

let greeting_file = match greeting_file_result {
    Ok(file) => file,
    Err(error) => match error.kind() {
        ErrorKind::NotFound => match File::create("hello.txt") {
            Ok(fc) => fc,
            Err(e) => panic!("Problem creating the file: {e:?}"),
        },
        other_error => {
            panic!("Problem opening the file: {other_error:?}");
        }
    },
};

Returning the error from a method call:

// Manually using `return`.
fn read_username_from_file() -> Result<String, io::Error> {
    let username_file_result = File::open("hello.txt");

    let mut username_file = match username_file_result {
        Ok(file) => file,
        Err(e) => return Err(e),
    };

// Automatically using `?` syntax.
fn read_username_from_file() -> Result<String, io::Error> {
    let mut username_file = File::open("hello.txt")?;
    let mut username = String::new();
    username_file.read_to_string(&mut username)?;
    Ok(username)
}

The ? syntax

  • error values that have the ? operator called on them go through the from function, defined in the From trait in the standard library, which is used to convert values from one type into another.
  • When the ? operator calls the from function, the error type received is converted into the error type defined in the return type of the current function

Example usage:

// Automatically using `?` syntax.
fn read_username_from_file() -> Result<String, io::Error> {
    let mut username_file = File::open("hello.txt")?;
    let mut username = String::new();
    username_file.read_to_string(&mut username)?;
    Ok(username)
}

Panic/Recover.

  • set_hook
  • catch_unwind

Option<T>

Another important type in Rust is Option, which represents a nullable value.

The definition is as follows:

pub enum Option<T> {
	None,
	Some(T),
}

You can unwrap an Option:

type CouldBeString = Option<String>;

fn test() {
    let x = CouldBeString::Some("hello".to_string());
    let y = x.unwrap();
}

You can also match an Option:

type CouldBeString = Option<String>;

fn test() {
    let x = CouldBeString::Some("hello".to_string());
    let y = match x {
	    None => panic!("Woah where did x go"),
	    Some(s) => println!("All good we found {s}"),
    };
}

match

You can match multiple types.

match x {
    1 | 2 => println!("one or two"),
    3 => println!("three"),
    _ => println!("anything"),
}

You can match a range:

match x {
    1..=5 => println!("one through five"),
    _ => println!("something else"),
}

Destructuring.

let Point { x: a, y: b } = p;

Destructing nested enums

enum Color {
    Rgb(i32, i32, i32),
    Hsv(i32, i32, i32),
}

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    Write(String),
    ChangeColor(Color),
}

fn main() {
    let msg = Message::ChangeColor(Color::Hsv(0, 160, 255));

    match msg {
        Message::ChangeColor(Color::Rgb(r, g, b)) => {
            println!("Change color to red {r}, green {g}, and blue {b}");
        }
        Message::ChangeColor(Color::Hsv(h, s, v)) => {
            println!("Change color to hue {h}, saturation {s}, value {v}")
        }
        _ => (),
    }
}

Ignoring values.

fn main() {
    let numbers = (2, 4, 8, 16, 32);

    match numbers {
        (first, _, third, _, fifth) => {
            println!("Some numbers: {first}, {third}, {fifth}")
        }
    }
}

fn main() {
    let numbers = (2, 4, 8, 16, 32);

    match numbers {
        (first, .., last) => {
            println!("Some numbers: {first}, {last}");
        }
    }
}

Other aspects.

Other aspects of Rust:

  • Self - a type alias for the type we are defining or implementing
  • self - method subject or current module
  • static - global variable or lifetime lasting the entire program execution
  • async - return a Future instead of blocking the current thread
  • await - suspend execution until the result of a Future is ready
  • in - part of for loop syntax
  • loop - loop unconditionally
  • super - parent module of the current module

Structs, traits, generics.

Structs.

// A struct with members.
struct Point {
	x: f64;
	y: f64;
}

// A tuple struct with anonymous members.
// Accessible via .0, .1.
struct Point(f64, f64);
let p = Point(0.0, 1.0);
p.0 // 0.0
p.1 // 1.0

// A tuple struct with generic anonymous members.
struct S<T: Display>(T);

Traits.

trait Sum {
	fn sum() -> f64;
}

Trait implementations.

impl Sum for Point {
	fn sum(&self) -> f64 {
		self.x + self.y
	}
}

Generic structs.

In terms of structs:

// Defining a generic struct of two types: X1 and Y1.
struct Point<X1, Y1> {
    x: X1,
    y: Y1,
}

// Implementing methods for this generic struct:
impl<X1, Y1> Point<X1, Y1> {
    fn mixup<X2, Y2>(self, other: Point<X2, Y2>) -> Point<X1, Y2> {
        Point {
            x: self.x,
            y: other.y,
        }
    }
}

Rust accomplishes this by performing monomorphization of the code using generics at compile time. Monomorphization is the process of turning generic code into specific code by filling in the concrete types that are used when compiled.

Generic methods.

In Rust, traits are useful for creating functions and structs that can operate on different types, as constrained by traits.

use std::fmt::Display;

fn print_item<T: Display>(item: T) {
    println!("{}", item);
}

fn main() {
    print_item(42);       // Works with integers
    print_item("Hello");  // Works with strings
}

The T: Display syntax restricts T to types that implement the Display trait, ensuring println! can work with it.

Generic lifetime annotations.

A lifetime for a reference is specified inside similarly to a generic type.

ie. <Display + 'a>

#[derive(Debug)]
struct Name<'a> {
    name: &'a str,
}

let name = String::from("Bob");
let n = Name { name: &name };

Generic traits and their implementations.

Traits can also be generic. For example, generic traits are useful for implementing arithmetic.

Here’s an example of a trait for a summable type - a type that implements the sum method:

// Define a generic trait `Summable` for types that contain elements we can add up
trait Summable<T> {
    fn sum(&self) -> T;
}

You can implement the Summable trait for example for the Vec<T> type. How do we do this?

Note that the Vec type is generic over T. Since we are not specifying T, only leaving it abstract, we will redeclare it.

  1. impl<T>.We begin with impl, followed by <T> since we are not specifying T for Summable.
  2. Summable<T>. And followed by Summable as this is the name of the trait. And then followed by <T> again since Summable will take any T .
  3. for Vec<T>. Followed by the struct we are implementing it for, for Vector<T>.
impl<T> Summable<T> for Vec<T> {
}

Then we will implement the method:

impl<T> Summable<T> for Vec<T> {
	fn sum(&self) -> T {
	}
}

Note we have simply taken the method definition from the trait, leaving the parameterised generic type T in place.

Now let’s write the logic.

impl<T> Summable<T> for Vec<T> {
	fn sum(&self) -> T {
		self.iter().fold(
			T::default(), // initial value.
			|acc, &item| acc + item, // closure
		);
	}
}

This converts the Vec to an Iterable<T>, then calls fold which will add the initial value (taken by Default and the current item. It dereferences the value by specifying item using the dereference operator specific to closures, &.

But wait!

  • expected type parameter T. found unit type ()
  • no function or associated item named default found for type parameter T in the current scope

rustc detects that it can’t even find the T here. Right now T is defined as the unit type () - ie. it’s nothing.

We need to actually specify T in terms of the traits we need.

What are the traits we need from T?

  • Default - to call ::default

    • Rust will help you with this.

      “the following trait defines an item default, perhaps you need to restrict type parameter T with it:: + Default"

  • Add - std::ops::Add<Output = T>

Let’s go ahead and specify those bounds:

use std::ops::Add;

impl<T : Default + Add<Output = T>> Summable<T> for Vec<T> {
	fn sum(&self) -> T {
		self.iter().fold(
			T::default(), // initial value.
			|acc, &item| acc + item, // closure
		)
	}
}

This looks really ugly though. We can use where here to separate it:

trait Summable<T> {
    fn sum(&self) -> T;
}

impl<T> Summable<T> for Vec<T> 
    where T: 
        Add<Output = T> 
        + Copy 
        + Default
{
	fn sum(&self) -> T {
		self.iter().fold(
			T::default(), // initial value.
			|acc, &item| acc + item, // closure
		)
	}
}

A real example of a generic trait is the From<T> trait, which converts types from one type T into another.

pub trait From<T>: Sized {
    // Required method
    fn from(value: T) -> Self;
}

For example:

use std::convert::From;
struct MyStruct {}

impl From<String> for MyStruct {
    fn from(s: String) -> MyStruct {
        MyStruct{}
    }
}

Default trait implementations.

Traits can also have default methods implemented, they need not be abstract.

pub trait Summary {
    fn summarize(&self) -> String {
        String::from("(Read more...)")
    }
}

Type bounds (sumtypes).

Bounds enable you to stipulate the functionality of a type in terms of traits.

You can specify sumtypes / composite traits this way using +.

// Display trait.
pub trait Display {
    // Required method
    fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>;
}

// Add trait.
pub trait Add<Rhs = Self> {
    type Output;

    // Required method
    fn add(self, rhs: Rhs) -> Self::Output;
}

// A function that takes any type that implements both Display and Add
fn describe_sum<T: Display + Add<Output = T>>(a: T, b: T) {
		// Sum both `a` and `b`, requiring Add<Output = T>.
    let sum = a + b;
    // Display the `sum`, requiring Display.
    println!("The sum is: {}", sum);
}

Placeholder types.

Placeholder types are useful where you want to define a trait which encapsulates multiple user-specified concrete types. This contrasts with generic traits which allow for user-specified type bounds (but still abstract).

pub trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

struct X {}
impl Iterator for X {
	type Item = u64;
	fn next(&mut self) -> Option<Self::Item> {
		return Some(0);
	}
}

Function pointers.

fn test2(x: fn () -> u8) -> u8 {
	x()
}

Function pointers to trait implementors.

trait Summable<T> {
    fn sum(&self) -> T;
    fn sum2(&self) -> u8;
}

fn test2<T>(x: impl Summable<T>) -> u8 {
	x.sum2()
}

Returning closures.

Closures must be returned in a fixed-size type, ie. a Box.

Box captures a reference to the type T, which is a `dyn

fn returns_closure() -> Box<dyn Fn(i32) -> i32> {
    Box::new(|x| x + 1)
}

Type aliases.

type Kilometers = i32;

Recursive data types.

enum List {
    Cons(i32, Box<List>),
    Nil,
}

trpl15-01.svg

You can use the Deref trait to treat a type like Box<T> as a reference (ie. just referring to T), improving the readability of your code.

use std::ops::Deref;

impl<T> Deref for MyBox<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

Dynamic dispatch / trait objects.

We can use trait objects in place of a generic or concrete type. Wherever we use a trait object, Rust’s type system will ensure at compile time that any value used in that context will implement the trait object’s trait.

Consequently, we don’t need to know all the possible types at compile time.

Problem: how to allow multiple types that are valid in a particular situation. e.g. a Screen that renders by calling .draw() on each object.

Challenges:

  • Vec<T> can only hold objects of type T.
    • Workaround: we can write an enum to contain multiple types of items.
    • But challenge: the enum won’t be able to be user extensible.

Solution: dynamic dispatch to implementors that adhere to a trait spec, using trait objects.

Example use case. With the method using trait objects, one Screen instance can hold a Vec<T>that contains a Box<Button> as well as a Box<TextField>. Let’s look at how this works, and then we’ll talk about the runtime performance implications.

pub trait Draw {
    fn draw(&self);
}

impl<T> Screen<T>
where
    T: Draw,
{
    pub fn run(&self) {
        for component in self.components.iter() {
            component.draw();
        }
    }
}

//
// Concept 1: Using generic type parameters.
// This restricts us to a Screen instance that has a list of components all of type Button or all of type TextField.
//
pub struct Screen<T: Draw> {
    pub components: Vec<T>,
}

//
// Concept 2: Using trait objects.
//
pub struct Screen {
    pub components: Vec<Box<dyn Draw>>,
}

When we use trait objects, Rust must use dynamic dispatch. The compiler doesn’t know all the types that might be used with the code that’s using trait objects, so it doesn’t know which method implemented on which type to call. Instead, at runtime, Rust uses the pointers inside the trait object to know which method to call. This lookup incurs a runtime cost that doesn’t occur with static dispatch. Dynamic dispatch also prevents the compiler from choosing to inline a method’s code, which in turn prevents some optimizations

Common data types.

  • Vec
    • Created using vec! shorthand
  • [u8; 32] fixed-size slices.
  • Strings
    • str
    • String - heap
  • HashMap<K,V>

Rc and Arc

We use Rc<T> and Arc<T> when we want to create references to data on the heap for multiple parts of our program BUT we do not know at compile-time which part will finish using the data last.

Concurrency.

Constructs:

  • Mutex<T> provides exclusive access.
  • RwLock<T> provides multiple-reader single-writer access.

Footguns:

  • Rust can’t protect you from everything.
  • Rc<T> came with the risk of creating reference cycles, where two Rc<T> values refer to each other, causing memory leaks
  • Mutex<T> comes with the risk of creating deadlocks. These occur when an operation needs to lock two resources and two threads have each acquired one of the locks, causing them to wait for each other forever.
use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let counter = Arc::new(Mutex::new(0));
    let mut handles = vec![];

    for _ in 0..10 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            let mut num = counter.lock().unwrap();

            *num += 1;
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Result: {}", *counter.lock().unwrap());
}

Send

The Send marker trait indicates that ownership of values of the type implementing Send can be transferred between threads. Almost every Rust type is Send, but there are some exceptions, including Rc<T>: this cannot be Send because if you cloned an Rc<T> value and tried to transfer ownership of the clone to another thread, both threads might update the reference count at the same time

Sync

The Sync marker trait indicates that it is safe for the type implementing Sync to be referenced from multiple threads.

Go-style channels using mpsc.

use std::sync::mpsc;
use std::thread;

fn main() {
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        let val = String::from("hi");
        tx.send(val).unwrap();
    });

    let received = rx.recv().unwrap();
    println!("Got: {received}");
}

Common traits.

  • Comparison traits: EqPartialEqOrdPartialOrd.

  • Clone, to create T from &T via a copy.

  • Copy, to give a type ‘copy semantics’ instead of ‘move semantics’.

  • Hash, to compute a hash from &T.

  • Default, to create an empty instance of a data type.

  • Debug, to format a value using the {:?} formatter.

  • AsRef.

    Write a Rust function signature that can accept String&strPath, and PathBuf using a single parameter.

    The AsRef trait can convert String&str, and PathBuf to a Pathbecause there are implementations of AsRef on these types to perform the conversion.

    fn sample<P: AsRef<Path>>(p: P) { }

  • From and Into

Patterns.

.into_iter() vs .iter()

.iter() creates an iterator over a collection.

into_iter takes ownership over the collection.

Box<dyn TrieDB>

Rather than messing around with generics, you can use Box and dyn to allow users to pass in types which implement a trait without knowing which type they are.

The reference is stored on the heap (using Box) to a type which has a dynamic dispatch

pointer-to-trait-on-heap

The Rust compiler needs to know how much space every function’s return type requires

Arc<Mutex<T>>

This is a common pattern to provide:

  • Reference to a value
  • Where you can write
  • and the writes are thread-safe (using a mutex)

Usage:

  • create the Arc Mutex
  • then clone the arc wherever you need a reference
  • and where you need to write, simply call table.lock().unwrap() to get a MutexGuard
  • the MutexGuard will lock the value, and will unlock the value automatically upon it going out of scope inside the Drop trait
    • it so happens that by simply returning the MutexGuard and then using it, there is no assignment ofa variable, so it will go out of scope as soon as this finishes eg.:

      table.lock().unwrap().do_thing()
      
      after do_thing is finished, value is unlocked
      

Example:

async fn discover_peers(
    udp_addr: SocketAddr,
    signer: SigningKey,
    table: Arc<Mutex<KademliaTable>>,
    bootnodes: Vec<BootNode>,
) {

    let revalidation_handler = tokio::spawn(peers_revalidation(
        udp_addr,
        udp_socket.clone(),
        table.clone(), ///////////////////////////////////////////
        signer.clone(),
        REVALIDATION_INTERVAL_IN_SECONDS as u64,
    ));

    discovery_startup(
        udp_addr,
        udp_socket.clone(),
        table.clone(), ///////////////////////////////////////////
        signer.clone(),
        bootnodes,
    )
    .await;

    // a first initial lookup runs without waiting for the interval
    // so we need to allow some time to the pinged peers to ping us back and acknowledge us
    tokio::time::sleep(Duration::from_secs(10)).await;
    let lookup_handler = tokio::spawn(peers_lookup(
        udp_socket.clone(),
        table.clone(), ///////////////////////////////////////////
        signer.clone(),
        node_id_from_signing_key(&signer),
        PEERS_RANDOM_LOOKUP_TIME_IN_MIN as u64 * 60,
    ));

}

inner()

The use of a tuple struct and .inner() in this design provides a clean abstraction for encapsulating the internal Arc<Mutex<StoreInner>> structure within Store. Here’s why it might be implemented this way:

struct Wrapper<T> {
    value: T,
}

impl<T> Wrapper<T> {
    fn inner(&self) -> &T {
        &self.value
    }
}

#[derive(Default, Clone)]
pub struct Store(Arc<Mutex<StoreInner>>);

#[derive(Default, Debug)]
struct StoreInner {
    chain_data: ChainData,
    block_numbers: HashMap<BlockHash, BlockNumber>,
    canonical_hashes: HashMap<BlockNumber, BlockHash>,
    bodies: HashMap<BlockHash, BlockBody>,
    headers: HashMap<BlockHash, BlockHeader>,
    // Maps code hashes to code
    account_codes: HashMap<H256, Bytes>,
    // Maps transaction hashes to their blocks (height+hash) and index within the blocks.
    transaction_locations: HashMap<H256, Vec<(BlockNumber, BlockHash, Index)>>,
    // Stores pooled transactions by their hashes
    transaction_pool: HashMap<H256, MempoolTransaction>,
    // Stores the blobs_bundle for each blob transaction in the transaction_pool
    blobs_bundle_pool: HashMap<H256, BlobsBundle>,
    receipts: HashMap<BlockHash, HashMap<Index, Receipt>>,
    state_trie_nodes: NodeMap,
    storage_trie_nodes: HashMap<Address, NodeMap>,
    // TODO (#307): Remove TotalDifficulty.
    block_total_difficulties: HashMap<BlockHash, U256>,
    // Stores local blocks by payload id
    payloads: HashMap<u64, Block>,
}

Functions which take a closure which returns a Future.

impl AppContext { 
    /// Spawns the future returned by the given function on the thread pool. The closure will be invoked
    /// with [AsyncAppContext], which allows the application state to be accessed across await points.
    pub fn spawn<Fut, R>(&self, f: impl FnOnce(AsyncAppContext) -> Fut) -> Task<R>
    where
        Fut: Future<Output = R> + 'static,
        R: 'static,
    {
        self.foreground_executor.spawn(f(self.to_async()))
    }
}

newtype pattern.

The new type pattern in Rust takes an existing type and wraps it in a type created by the developer.

The purpose of using the new type pattern is to implement traits on existing types

Testing.

pub fn add(a: i32, b: i32) -> i32 {
    a + b
}

// This is a really bad adding function, its purpose is to fail in this
// example.
#[allow(dead_code)]
fn bad_add(a: i32, b: i32) -> i32 {
    a - b
}

#[cfg(test)]
mod tests {
    // Note this useful idiom: importing names from outer (for mod tests) scope.
    use super::*;

    #[test]
    fn test_add() {
        assert_eq!(add(1, 2), 3);
    }

    #[test]
    fn test_bad_add() {
        // This assert would fire and test will fail.
        // Please note, that private functions can be tested too!
        assert_eq!(bad_add(1, 2), 3);
    }
}

Common libraries.

  • tokio - an async runtime.
  • flamegraph - flamegraphs.
  • clap - CLI args.
  • serde - serialization library for JSON and other formats.