Discussion: New `unchecked` keyword

Published on Feb 28, 2022

A brief note

Rust is a language that aims to empower everyone to build reliable and efficient software. In order to do so, several things are built into the language’s semantics, saliently, lifetimes and usage of the unsafe keyword. To put it in one stroke, through lifetime subtyping and variance the compiler ensures that when a safe reference (by using &ident) is created, it is valid throughout it’s usage (in that region where the lifetime is valid).

πŸ¦” Hedgy: Wait, did you just say that the `unsafe` keyword is a significant part of building reliable software with Rust? Now you're just contradicting yourself :/
πŸ‘¦πŸ» Sayan: Hold on, I'm not done yet!

The unsafe keyword marks regions of code where the person writing the code has to uphold the same guarantees that the compiler would otherwise guarantee. But how does this help in writing safe code? Well, having unsafe code blocks indicates that this specific section of code is gnarly and if something’s going wrong, then you should first check if this block is doing the right thing…or not.

πŸ¦” Hedgy: Ah! Now I see.

In this post, I’d like to discuss about another check that can help ensure the correctness of programs. And this is none but the unchecked keyword. Do note that Rust’s goal is to ensure memory safety: leaks are fine, and so is incorrect logic – because only a flavor of the Skynet can ensure that your program is logically correct by determining your intentions ;)

Motivation

Let’s say we’re in a fictious world where finanical institutions have more crabs πŸ¦€ in their offices than cups of coffee β˜•. Now, this bank has a library that is used by the bank’s developers to program systems that interact with the bank. Also, these libraries have an extreme level of access to the bank’s transactions. Now, the person who created a library has an Overdraft type. This is declared as follows:

struct Overdraft {
    available: u64,
    limit: u64,
}

Apart from all the other associated functions, we’ll look at the withdraw and withdraw_unchecked functions:

impl Overdraft {
    /// Check the available balance and limit, only withdrawing when they are fine
    pub fn withdraw(&mut self, howmuch: u64) -> Result<(), ()> {
        if self.available >= howmuch && self.limit <= howmuch {
            // some very long code block of internal bank
            // stuff that we won't bother with it
            self.withdraw_unchecked(howmuch);
            Ok(())
        } else {
            Err(())
        }
    }
    /// Use this when the withdrawl is emergency and is authorized by
    /// bank stakeholders
    pub fn withdraw_unchecked(&mut self, howmuch: u64) {
        // some very long code block of internal bank
        // stuff that we won't bother with it
        self.available -= howmuch;
    }
}

The withdraw function is completely fine: it checks the limit and available balance before withdrawl while the withdraw_unchecked function doesn’t check those and simply withdraws money. Now, in reality the withdraw function must have several more things going on than a simple conditional, think stuff like: calculating credit score, asking some intermediary organization or provider et cetera. This means that the withdraw function is actually pretty expensive to call. That’s why the developers of the library provided the withdraw_unchecked function that enables the bank’s developers to skip those expensive checks and immediately allow withdrawl.

What can possibly go wrong? Well, in a rush one of the bank’s developers calls withdraw_unchecked where withdraw should’ve been called; in that case, the account holder might be allowed to borrow far more than the overdraft allows! Now, this is a logic error (and a terrible one and the bank can definitely go mad over the dev who wrote the program). For whatever reason (a broken test suite, deadlines, et al), this made it to the production channel, …. and the bank lost a lot of money.

The bank’s CEO called the CTO and told them to tell the responsible devs to make sure that this never happens again. Now, the CTO communicates this over to the devs who maintain the “dangerous library” that is used to make low level changes to the transactions of the bank. Now, they wanted to do something about this. They thought they could improve the docs and write in HUGE BLOCK LETTERS printed all over the workspace that the *_unchecked set of functions are to be only called responsibly. But well, they soon realized that people might still end up calling the library’s privileged functions and this was to be prevented, right at the library level. So, how could they do it?

With Rust today, the solution that these devs could’ve adopted would be to mark the function unsafe. In order to use this function then, the users of the library must put things into an unsafe block and that will definitely remind them that they’re doing something wrong. However, that is not the intended use of the unsafe keyword. The Rust std documentation notes the use of unsafe as:

Code or interfaces whose memory safety cannot be verified by the type system.
The unsafe keyword has two uses: to declare the existence of contracts the compiler can’t check (unsafe fn and unsafe trait), and to declare that a programmer has checked that these contracts have been upheld (unsafe {} and unsafe impl, but also unsafe fn – see below). They are not mutually exclusive, as can be seen in unsafe fn.1

As you can see, the clear intention of the unsafe keyword is to demarcate regions where memory safety cannot be guaranteed by the compiler. However, in the above scenario, simply substracting a value would not cause memory unsafety; at the worst, it could cause an arithmetic underflow (panic in debug mode and wrap around in release) which however doesn’t introduce any sort of memory unsafety2.

So, what’s the possible language way to ensure correctness? That’s where I’d like to propose an unchecked keyword.

πŸ¦” Hedgy: And, the worst example of the year 2022 award goes to Sayan
πŸ‘¦πŸ» Sayan: Whatever! You get the point, right?
πŸ¦” Hedgy: Yeah, you're trying to reduce logic errors.
πŸ‘¦πŸ» Sayan: Right!

But the unchecked keyword does more than attempting to reduce logic errors; it also reduces the ambiguity3 surrounding unsafe functions. Today, when an external user of the library finds an unsafe function β€” is it an invariant that they have to upkeep to ensure memory safety, or is it one that they have to upkeep to ensure logical correctness? The only way to be informed about this is to rely on the documentation that the crate provides or as a last resort, check the implementation. With unchecked it is immediately clear that the function call won’t invalidate any memory safety contracts, but might cause correctness errors which do not lead to memory unsafety.

Unchecked functions

An unchecked function is one that doesn’t cause any memory unsafety, but however, it can cause logical inaccuracies. It is declared just like an unsafe function:

unchecked fn withdraw_unchecked() {
    // ... something silly ...
}

where you have unchecked in the function signature. To call an unchecked function, I propose two solutions:

  1. Use unchecked code blocks:
    unchecked {
        silly_a();
        silly_b();
    }
    
  2. Use unchecked before the function call:
    unchecked silly_a();
    unchecked silly_b();
    

unchecked and unsafe overlap

Another important note, for a set A of unchecked function calls/definitions and set B of unsafe function calls/definitions, A ∩ B = Ø, that is there is no overlap between unchecked and unsafe functions. What does this mean?

Let’s say that you have the following functions:

unsafe fn bad_a() {}
unchecked fn silly_b() {}

If you decide to run the below, it will cause an error:

unsafe {
    bad_a(); // this won't error
    silly_a(); // but this will
}

The converse is also true:

unchecked {
    silly_a(); // this won't error
    bad_a(); // this will error
}

The correct way to call them would be:

unsafe {
    bad_a();
}
unchecked {
    silly_a();
}

Now you’re going to say – the developer needs to responsibly write software. Well, I’m going to quote Esteban here:

There are no bad programmers, only insufficiently advanced compilers – Esteban KΓΌber4

Being someone in the Rust community, I don’t think you’re going to disagree :D

Alternatives

A possible alternative for someone implementing and using an unchecked function (not possible with a library) is by using a macro. The macro, say called unchecked would be used to declare unchecked functions like below:

unchecked! {
    pub fn silly() -> &'static str{
        "You silly"
    }
}

And be called like:

let silly = unchecked_call!(silly);

The macro simply concatenates the unchecked_ prefix to the function name so that just calling silly won’t work, but only calling unchecked_silly would. This is a very limited workaround because rustdoc will anyways output the name unchecked_silly in the generated documentation, defeating the purpose of having it in the first place. This was also suggested by an user5.

How we should look at unsafe and unchecked

Finally, I’d like to exactly define what each keyword means:

  • unsafe: Any function marked as unsafe informs the caller that the function will cause memory unsafety if the mentioned invariants (preferably in the ## Safety part in rustdoc) are not upheld
  • unchecked: Any function marked as unchecked informs the caller that the function does not cause memory unsafety but if the mentioned invariants (preferably in a ## Correctness part in rustdoc) are not upheld, then it may cause logic errors

I hope this clears the ambiguity between unsafe and unchecked. Now one might argue, “but logic errors can happen anywhere.” Sure, they can. But here, you are asserting by using the unchecked keyword that you’re the one upholding the invariants required for correctness when calling this method.

Further discussion

This post is intended to be a starting discussion, but there’s a lot more that can be added here. For example, what about unchecked Traits and unchecked impls, for example? Similarly what about unchecked unsafe fns? An unsafe function that is also unchecked? These are some unresolved questions that I can think of. Also, is this worth adding to the language as it might only add to additional language complexity?

As some users have pointed out67, a clippy lint or something similar for unchecked functions to describe the exact kind of correctness invariant that has to be ensured by the library user might also be a good idea.

I’d love to know what you think. If you’re on a social site where you saw this post, consider dropping a reply to the post or shoot me a DM on Twitter if you’d like to attack me personally ;)