About salsa

Salsa is a Rust framework for writing incremental, on-demand programs -- these are programs that want to adapt to changes in their inputs, continuously producing a new output that is up-to-date. Salsa is based on the the incremental recompilation techniques that we built for rustc, and many (but not all) of its users are building compilers or other similar tooling.

If you'd like to learn more about Salsa, you can check out the Hello World example in the repository, or watch some of our YouTube videos.

If you'd like to chat about Salsa, or you think you might like to contribute, please jump on to our Zulip instance at salsa.zulipchat.com.

How to use Salsa

How Salsa works

Video available

To get the most complete introduction to Salsa's inner works, check out the "How Salsa Works" video. If you'd like a deeper dive, the "Salsa in more depth" video digs into the details of the incremental algorithm.

Key idea

The key idea of salsa is that you define your program as a set of queries. Every query is used like function K -> V that maps from some key of type K to a value of type V. Queries come in two basic varieties:

  • Inputs: the base inputs to your system. You can change these whenever you like.
  • Functions: pure functions (no side effects) that transform your inputs into other values. The results of queries is memoized to avoid recomputing them a lot. When you make changes to the inputs, we'll figure out (fairly intelligently) when we can re-use these memoized values and when we have to recompute them.

How to use Salsa in three easy steps

Using salsa is as easy as 1, 2, 3...

  1. Define one or more query groups that contain the inputs and queries you will need. We'll start with one such group, but later on you can use more than one to break up your system into components (or spread your code across crates).
  2. Define the query functions where appropriate.
  3. Define the database, which contains the storage for all the inputs/queries you will be using. The query struct will contain the storage for all of the inputs/queries and may also contain anything else that your code needs (e.g., configuration data).

To see an example of this in action, check out the hello_world example, which has a number of comments explaining how things work.

Digging into the plumbing

Check out the plumbing chapter to see a deeper explanation of the code that salsa generates and how it connects to the salsa library.

Common patterns

This section documents patterns for using Salsa.

Selection

The "selection" (or "firewall") pattern is when you have a query Qsel that reads from some other Qbase and extracts some small bit of information from Qbase that it returns. In particular, Qsel does not combine values from other queries. In some sense, then, Qsel is redundant -- you could have just extracted the information the information from Qbase yourself, and done without the salsa machinery. But Qsel serves a role in that it limits the amount of re-execution that is required when Qbase changes.

Example: the base query

For example, imagine that you have a query parse that parses the input text of a request and returns a ParsedResult, which contains a header and a body:

#[derive(Clone, Debug, PartialEq, Eq)]
struct ParsedResult {
    header: Vec<ParsedHeader>,
    body: String,
}

#[derive(Clone, Debug, PartialEq, Eq)]
struct ParsedHeader {
    key: String,
    value: String,
}

#[salsa::query_group(Request)]
trait RequestParser {
    /// The base text of the request.
    #[salsa::input]
    fn request_text(&self) -> String;

    /// The parsed form of the request.
    fn parse(&self) -> ParsedResult;
} 

Example: a selecting query

And now you have a number of derived queries that only look at the header. For example, one might extract the "content-type' header:

#[salsa::query_group(Request)]
trait RequestUtil: RequestParser {
    fn content_type(&self) -> Option<String>;
}

fn content_type(db: &dyn RequestUtil) -> Option<String> {
    db.parse()
        .header
        .iter()
        .find(|header| header.key == "content-type")
        .map(|header| header.value.clone())
} 

Why prefer a selecting query?

This content_type query is an instance of the selection pattern. It only "selects" a small bit of information from the ParsedResult. You might not have made it a query at all, but instead made it a method on ParsedResult.

But using a query for content_type has an advantage: now if there are downstream queries that only depend on the content_type (or perhaps on other headers extracted via a similar pattern), those queries will not have to be re-executed when the request changes unless the content-type header changes. Consider the dependency graph:

request_text  -->  parse  -->  content_type  -->  (other queries)

When the request_text changes, we are always going to have to re-execute parse. If that produces a new parsed result, we are also going to re-execute content_type. But if the result of content_type has not changed, then we will not re-execute the other queries.

More levels of selection

In fact, in our example we might consider introducing another level of selection. Instead of having content_type directly access the results of parse, it might be better to insert a selecting query that just extracts the header:

#[salsa::query_group(Request)]
trait RequestUtil: RequestParser {
    fn header(&self) -> Vec<ParsedHeader>;
    fn content_type(&self) -> Option<String>;
}

fn header(db: &dyn RequestUtil) -> Vec<ParsedHeader> {
    db.parse().header.clone()
}

fn content_type(db: &dyn RequestUtil) -> Option<String> {
    db.header()
        .iter()
        .find(|header| header.key == "content-type")
        .map(|header| header.value.clone())
} 

This will result in a dependency graph like so:

request_text  -->  parse  -->  header -->  content_type  -->  (other queries)

The advantage of this is that changes that only effect the "body" or only consume small parts of the request will not require us to re-execute content_type at all. This would be particularly valuable if there are a lot of dependent headers.

A note on cloning and efficiency

In this example, we used common Rust types like Vec and String, and we cloned them quite frequently. This will work just fine in Salsa, but it may not be the most efficient choice. This is because each clone is going to produce a deep copy of the result. As a simple fix, you might convert your data structures to use Arc (e.g., Arc<Vec<ParsedHeader>>), which makes cloning cheap.

On-Demand (Lazy) Inputs

Salsa input queries work best if you can easily provide all of the inputs upfront. However sometimes the set of inputs is not known beforehand.

A typical example is reading files from disk. While it is possible to eagerly scan a particular directory and create an in-memory file tree in a salsa input query, a more straight-forward approach is to read the files lazily. That is, when someone requests the text of a file for the first time:

  1. Read the file from disk and cache it.
  2. Setup a file-system watcher for this path.
  3. Invalidate the cached file once the watcher sends a change notification.

This is possible to achieve in salsa, using a derived query and report_synthetic_read and invalidate queries. The setup looks roughly like this:

#[salsa::query_group(VfsDatabaseStorage)]
trait VfsDatabase: salsa::Database + FileWatcher {
    fn read(&self, path: PathBuf) -> String;
}

trait FileWatcher {
    fn watch(&self, path: &Path);
    fn did_change_file(&mut self, path: &Path);
}

fn read(db: &dyn salsa::Database, path: PathBuf) -> String {
    db.salsa_runtime()
        .report_synthetic_read(salsa::Durability::LOW);
    db.watch(&path);
    std::fs::read_to_string(&path).unwrap_or_default()
}

#[salsa::database(VfsDatabaseStorage)]
struct MyDatabase { ... }

impl FileWatcher for MyDatabase {
    fn watch(&self, path: &Path) { ... }
    fn did_change_file(&mut self, path: &Path) {
        self.query_mut(ReadQuery).invalidate(path);
    }
}
  • We declare the query as a derived query (which is the default).
  • In the query implementation, we don't call any other query and just directly read file from disk.
  • Because the query doesn't read any inputs, it will be assigned a HIGH durability by default, which we override with report_synthetic_read.
  • The result of the query is cached, and we must call invalidate to clear this cache.

YouTube videos

There are currently two videos about Salsa available:

  • How Salsa Works, which gives a high-level introduction to the key concepts involved and shows how to use salsa;
  • Salsa In More Depth, which digs into the incremental algorithm and explains -- at a high-level -- how Salsa is implemented.

Plumbing

This chapter documents the code that salsa generates and its "inner workings". We refer to this as the "plumbing".

This page walks through the "Hello, World!" example and explains the code that it generates. Please take it with a grain of salt: while we make an effort to keep this documentation up to date, this sort of thing can fall out of date easily. See the page history below for major updates.

If you'd like to see for yourself, you can set the environment variable SALSA_DUMP to 1 while the procedural macro runs, and it will dump the full output to stdout. I recommend piping the output through rustfmt.

History

  • 2020-07-05: Updated to take RFC 6 into account.
  • 2020-06-24: Initial version.

Diagram

Based on the hello world example:

#[salsa::query_group(HelloWorldStorage)]
trait HelloWorld: salsa::Database {
    // For each query, we give the name, some input keys (here, we
    // have one key, `()`) and the output type `Arc<String>`. We can
    // use attributes to give other configuration:
    //
    // - `salsa::input` indicates that this is an "input" to the system,
    //   which must be explicitly set. The `salsa::query_group` method
    //   will autogenerate a `set_input_string` method that can be
    //   used to set the input.
    #[salsa::input]
    fn input_string(&self, key: ()) -> Arc<String>;

    // This is a *derived query*, meaning its value is specified by
    // a function (see Step 2, below).
    fn length(&self, key: ()) -> usize;
}
#[salsa::database(HelloWorldStorage)]
#[derive(Default)]
struct DatabaseStruct {
    storage: salsa::Storage<Self>,
}

impl salsa::Database for DatabaseStruct {}
graph LR
    classDef diagramNode text-align:left;
    subgraph query group
        HelloWorldTrait["trait HelloWorld: Database + HasQueryGroup(HelloWorldStroage)"]
        HelloWorldImpl["impl(DB) HelloWorld for DB
where DB: HasQueryGroup(HelloWorldStorage)"] click HelloWorldImpl "http:query_groups.html#impl-of-the-hello-world-trait" "more info" HelloWorldStorage["struct HelloWorldStorage"] click HelloWorldStorage "http:query_groups.html#the-group-struct-and-querygroup-trait" "more info" QueryGroupImpl["impl QueryGroup for HelloWorldStorage
  type DynDb = dyn HelloWorld
  type Storage = HelloWorldGroupStorage__;"] click QueryGroupImpl "http:query_groups.html#the-group-struct-and-querygroup-trait" "more info" HelloWorldGroupStorage["struct HelloWorldGroupStorage__"] click HelloWorldGroupStorage "http:query_groups.html#group-storage" "more info" subgraph for each query... LengthQuery[struct LengthQuery] LengthQueryImpl["impl Query for LengthQuery
  type Key = ()
  type Value = usize
  type Storage = salsa::DerivedStorage(Self)
  type QueryGroup = HelloWorldStorage"] LengthQueryFunctionImpl["impl QueryFunction for LengthQuery
  fn execute(db: &dyn HelloWorld, key: ()) -> usize"] click LengthQuery "http:query_groups.html#for-each-query-a-query-struct" "more info" click LengthQueryImpl "http:query_groups.html#for-each-query-a-query-struct" "more info" click LengthQueryFunctionImpl "http:query_groups.html#for-each-query-a-query-struct" "more info" end class HelloWorldTrait,HelloWorldImpl,HelloWorldStorage,QueryGroupImpl,HelloWorldGroupStorage diagramNode; class LengthQuery,LengthQueryImpl,LengthQueryFunctionImpl diagramNode; end subgraph database DatabaseStruct["struct Database { .. storage: Storage(Self) .. }"] subgraph for each group... HasQueryGroup["impl plumbing::HasQueryGroup(HelloWorldStorage) for DatabaseStruct"] click HasQueryGroup "http:database.html#the-hasquerygroup-impl" "more info" end DatabaseStorageTypes["impl plumbing::DatabaseStorageTypes for DatabaseStruct
  type DatabaseStorage = __SalsaDatabaseStorage"] click DatabaseStorageTypes "http:database.html#the-databasestoragetypes-impl" "more info" DatabaseStorage["struct __SalsaDatabaseStorage"] click DatabaseStorage "http:database.html#the-database-storage-struct" "more info" DatabaseOps["impl plumbing::DatabaseOps for DatabaseStruct"] click DatabaseOps "http:database.html#the-databaseops-impl" "more info" class DatabaseStruct,DatabaseStorage,DatabaseStorageTypes,DatabaseOps,HasQueryGroup diagramNode; end subgraph salsa crate DerivedStorage["DerivedStorage"] class DerivedStorage diagramNode; end LengthQueryImpl --> DerivedStorage; DatabaseStruct --> HelloWorldImpl HasQueryGroup --> HelloWorldImpl

Query groups and query group structs

When you define a query group trait:

#[salsa::query_group(HelloWorldStorage)]
trait HelloWorld: salsa::Database {
    // For each query, we give the name, some input keys (here, we
    // have one key, `()`) and the output type `Arc<String>`. We can
    // use attributes to give other configuration:
    //
    // - `salsa::input` indicates that this is an "input" to the system,
    //   which must be explicitly set. The `salsa::query_group` method
    //   will autogenerate a `set_input_string` method that can be
    //   used to set the input.
    #[salsa::input]
    fn input_string(&self, key: ()) -> Arc<String>;

    // This is a *derived query*, meaning its value is specified by
    // a function (see Step 2, below).
    fn length(&self, key: ()) -> usize;
}

the salsa::query_group macro generates a number of things, shown in the sample generated code below (details in the sections to come).

and associated storage struct) that represent things which don't have "public" Note that there are a number of structs and types (e.g., the group descriptor names. We currently generate mangled names with __ afterwards, but those names are not meant to be exposed to the user (ideally we'd use hygiene to enforce this).

// First, a copy of the trait, though with extra supertraits and
// sometimes with some extra methods (e.g., `set_input_string`)
trait HelloWorld: 
    salsa::Database + 
    salsa::plumbing::HasQueryGroup<HelloWorldStorage>
{
    fn input_string(&self, key: ()) -> Arc<String>;
    fn set_input_string(&mut self, key: (), value: Arc<String>);
    fn length(&self, key: ()) -> usize;
}

// Next, the "query group struct", whose name was given by the
// user. This struct implements the `QueryGroup` trait which
// defines a few associated types common to the entire group.
struct HelloWorldStorage { }
impl salsa::plumbing::QueryGroup for HelloWorldStorage {
    type DynDb = dyn HelloWorld;
    type GroupStorage = HelloWorldGroupStorage__;
}

// Next, a blanket impl of the `HelloWorld` trait. This impl
// works for any database `DB` that implements the
// appropriate `HasQueryGroup`.
impl<DB> HelloWorld for DB
where
  DB: salsa::Database,
  DB: salsa::plumbing::HasQueryGroup<HelloWorldStorage>,
{
  ...
}

// Next, for each query, a "query struct" that represents it.
// The query struct has inherent methods like `in_db` and
// implements the `Query` trait, which defines various
// details about the query (e.g., its key, value, etc).
pub struct InputQuery { }
impl InputQuery { /* definition for `in_db`, etc */ }
impl salsa::Query for InputQuery {
    /* associated types */
}

// Same as above, but for the derived query `length`.
// For derived queries, we also implement `QueryFunction`
// which defines how to execute the query.
pub struct LengthQuery { }
impl salsa::Query for LengthQuery {
    ...
}
impl salsa::QueryFunction for LengthQuery {
    ...
}

// Finally, the group storage, which contains the actual
// hashmaps and other data used to implement the queries.
struct HelloWorldGroupStorage__ { .. }

The group struct and QueryGroup trait

The group struct is the only thing we generate whose name is known to the user. For a query group named Foo, it is conventionally called FooStorage, hence the name HelloWorldStorage in our example.

Despite the name "Storage", the struct itself has no fields. It exists only to implement the QueryGroup trait. This trait has a number of associated types that reference various bits of the query group, including the actual "group storage" struct:

struct HelloWorldStorage { }
impl salsa::plumbing::QueryGroup for HelloWorldStorage {
    type DynDb = dyn HelloWorld;
    type GroupStorage = HelloWorldGroupStorage__; // generated struct
}

We'll go into detail on these types below and the role they play, but one that we didn't mention yet is GroupData. That is a kind of hack used to manage send/sync around slots, and it gets covered in the section on slots.

Impl of the hello world trait

Ultimately, every salsa query group is going to be implemented by your final database type, which is not currently known to us (it is created by combining multiple salsa query groups). In fact, this salsa query group could be composed into multiple database types. However, we want to generate the impl of the query-group trait here in this crate, because this is the point where the trait definition is visible and known to us (otherwise, we'd have to duplicate the method definitions).

So what we do is that we define a different trait, called plumbing::HasQueryGroup<G>, that can be implemented by the database type. HasQueryGroup is generic over the query group struct. So then we can provide an impl of HelloWorld for any database type DB where DB: HasQueryGroup<HelloWorldStorage>. This HasQueryGroup defines a few methods that, given a DB, give access to the data for the query group and a few other things.

Thus we can generate an impl that looks like:

impl<DB> HelloWorld for DB
where
    DB: salsa::Database,
    DB: salsa::plumbing::HasQueryGroup<HelloWorld>
{
    ...
    fn length(&self, key: ()) -> Arc<String> {
      <Self as salsa::plumbing::GetQueryTable<HelloWorldLength__>>::get_query_table(self).get(())
    }
}

You can see that the various methods just hook into generic functions in the salsa::plumbing module. These functions are generic over the query types (HelloWorldLength__) that will be described shortly. The details of the "query table" are covered in a future section, but in short this code pulls out the hasmap for storing the length results and invokes the generic salsa logic to check for a valid result, etc.

For each query, a query struct

As we referenced in the previous section, each query in the trait gets a struct that represents it. This struct is named after the query, converted into snake case and with the word Query appended. In typical Salsa workflows, these structs are not meant to be named or used, but in some cases it may be required. For e.g. the length query, this structs might look something like:

struct LengthQuery { }

The struct also implements the plumbing::Query trait, which defines a bunch of metadata about the query (and repeats, for convenience, some of the data about the group that the query is in):

            impl salsa::Query for #qt
            {
                type Key = (#(#keys),*);
                type Value = #value;
                type Storage = #storage;

                const QUERY_INDEX: u16 = #query_index;

                const QUERY_NAME: &'static str = #query_name;

                fn query_storage<'a>(
                    group_storage: &'a <Self as salsa::QueryDb<'_>>::GroupStorage,
                ) -> &'a std::sync::Arc<Self::Storage> {
                    &group_storage.#fn_name
                }
            }

Depending on the kind of query, we may also generate other impls, such as an impl of salsa::plumbing::QueryFunction, which defines the methods for executing the body of a query. This impl would then include a call to the user's actual function.

                impl salsa::plumbing::QueryFunction for #qt
                {
                    fn execute(db: &<Self as salsa::QueryDb<'_>>::DynDb, #key_pattern: <Self as salsa::Query>::Key)
                        -> <Self as salsa::Query>::Value {
                        invoke(db, #(#key_names),*)
                    }

                    recover
                }

Group storage

The "group storage" is the actual struct that contains all the hashtables and so forth for each query. The types of these are ultimately defined by the Storage associated type for each query type. The struct is generic over the final database type:

struct HelloWorldGroupStorage__ {
    input: <InputQuery as Query::Storage,
    length: <LengthQuery as Query>::Storage,
}

We also generate some inherent methods. First, a new method that takes the group index as a parameter and passes it along to each of the query storage new methods:

        impl #group_storage {
            trait_vis fn new(group_index: u16) -> Self {
                group_storage {
                    (
                        queries_with_storage:
                        std::sync::Arc::new(salsa::plumbing::QueryStorageOps::new(group_index)),
                    )*
                }
            }
        }

And then various methods that will dispatch from a DatabaseKeyIndex that corresponds to this query group into the appropriate query within the group. Each has a similar structure of matching on the query index and then delegating to some method defined by the query storage:

        impl #group_storage {
            trait_vis fn fmt_index(
                &self,
                db: &(#dyn_db + '_),
                input: salsa::DatabaseKeyIndex,
                fmt: &mut std::fmt::Formatter<'_>,
            ) -> std::fmt::Result {
                match input.query_index() {
                    fmt_ops
                    i => panic!("salsa: impossible query index {}", i),
                }
            }

            trait_vis fn maybe_changed_since(
                &self,
                db: &(#dyn_db + '_),
                input: salsa::DatabaseKeyIndex,
                revision: salsa::Revision,
            ) -> bool {
                match input.query_index() {
                    maybe_changed_ops
                    i => panic!("salsa: impossible query index {}", i),
                }
            }

            trait_vis fn for_each_query(
                &self,
                _runtime: &salsa::Runtime,
                mut op: &mut dyn FnMut(&dyn salsa::plumbing::QueryStorageMassOps),
            ) {
                for_each_ops
            }
        }

Database

Continuing our dissection, the other thing which a user must define is a database, which looks something like this:

#[salsa::database(HelloWorldStorage)]
#[derive(Default)]
struct DatabaseStruct {
    storage: salsa::Storage<Self>,
}

impl salsa::Database for DatabaseStruct {}

The salsa::database procedural macro takes a list of query group structs (like HelloWorldStorage) and generates the following items:

  • a copy of the database struct it is applied to
  • a struct __SalsaDatabaseStorage that contains all the storage structs for each query group. Note: these are the structs full of hashmaps etc that are generaetd by the query group procdural macro, not the HelloWorldStorage struct itself.
  • an impl of HasQueryGroup<G> for each query group G
  • an impl of salsa::plumbing::DatabaseStorageTypes for the database struct
  • an impl of salsa::plumbing::DatabaseOps for the database struct

Key constraint: we do not know the names of individual queries

There is one key constraint in the design here. None of this code knows the names of individual queries. It only knows the name of the query group storage struct. This means that we often delegate things to the group -- e.g., the database key is composed of group keys. This is similar to how none of the code in the query group knows the full set of query groups, and so it must use associated types from the Database trait whenever it needs to put something in a "global" context.

The database storage struct

The __SalsaDatabaseStorage struct concatenates all of the query group storage structs. In the hello world example, it looks something like:

struct __SalsaDatabaseStorage {
    hello_world: <HelloWorldStorage as salsa::plumbing::QueryGroup<DatabaseStruct>>::GroupStorage
}

We also generate a Default impl for __SalsaDatabaseStorage. It invokes a new method on each group storage with the unique index assigned to that group. This invokes the inherent new method generated by the #[salsa::query_group] macro.

The HasQueryGroup impl

The HasQueryGroup trait allows a given query group to access its definition within the greater database. The impl is generated here:

        has_group_impls.extend(quote! {
            impl salsa::plumbing::HasQueryGroup<#group_path> for #database_name {
                fn group_storage(&self) -> &#group_storage {
                    &self.#db_storage_field.query_store().#group_name_snake
                }
            }
        });

The HasQueryGroup impl combines with the blanket impl from the #[salsa::query_group] macro so that the database can implement the query group trait (e.g., the HelloWorld trait) but without knowing all the names of the query methods and the like.

The DatabaseStorageTypes impl

Then there are a variety of other impls, like this one for DatabaseStorageTypes:

    output.extend(quote! {
        impl salsa::plumbing::DatabaseStorageTypes for #database_name {
            type DatabaseStorage = __SalsaDatabaseStorage;
        }
    });

The DatabaseOps impl

Or this one for DatabaseOps, which defines the for-each method to invoke an operation on every kind of query in the database. It ultimately delegates to the for_each methods for the groups:

    let mut fmt_ops = proc_macro2::TokenStream::new();
    let mut maybe_changed_ops = proc_macro2::TokenStream::new();
    let mut for_each_ops = proc_macro2::TokenStream::new();
    for ((QueryGroup { group_path }, group_storage), group_index) in query_groups
        .iter()
        .zip(&query_group_storage_names)
        .zip(0_u16..)
    {
        fmt_ops.extend(quote! {
            group_index => {
                let storage: &#group_storage =
                    <Self as salsa::plumbing::HasQueryGroup<#group_path>>::group_storage(self);
                storage.fmt_index(self, input, fmt)
            }
        });
        maybe_changed_ops.extend(quote! {
            group_index => {
                let storage: &#group_storage =
                    <Self as salsa::plumbing::HasQueryGroup<#group_path>>::group_storage(self);
                storage.maybe_changed_since(self, input, revision)
            }
        });
        for_each_ops.extend(quote! {
            let storage: &#group_storage =
                <Self as salsa::plumbing::HasQueryGroup<#group_path>>::group_storage(self);
            storage.for_each_query(runtime, &mut op);
        });
    }
    output.extend(quote! {
        impl salsa::plumbing::DatabaseOps for #database_name {
            fn ops_database(&self) -> &dyn salsa::Database {
                self
            }

            fn ops_salsa_runtime(&self) -> &salsa::Runtime {
                self.#db_storage_field.salsa_runtime()
            }

            fn ops_salsa_runtime_mut(&mut self) -> &mut salsa::Runtime {
                self.#db_storage_field.salsa_runtime_mut()
            }

            fn fmt_index(
                &self,
                input: salsa::DatabaseKeyIndex,
                fmt: &mut std::fmt::Formatter<'_>,
            ) -> std::fmt::Result {
                match input.group_index() {
                    fmt_ops
                    i => panic!("salsa: invalid group index {}", i)
                }
            }

            fn maybe_changed_since(
                &self,
                input: salsa::DatabaseKeyIndex,
                revision: salsa::Revision
            ) -> bool {
                match input.group_index() {
                    maybe_changed_ops
                    i => panic!("salsa: invalid group index {}", i)
                }
            }

            fn for_each_query(
                &self,
                mut op: &mut dyn FnMut(&dyn salsa::plumbing::QueryStorageMassOps),
            ) {
                let runtime = salsa::Database::salsa_runtime(self);
                for_each_ops
            }
        }
    });

RFCs

The Salsa RFC process is used to describe the motivations for major changes made to Salsa. RFCs are recorded here in the Salsa book as a historical record of the considerations that were raised at the time. Note that the contents of RFCs, once merged, is typically not updated to match further changes. Instead, the rest of the book is updated to include the RFC text and then kept up to date as more PRs land and so forth.

Creating an RFC

If you'd like to propose a major new Salsa feature, simply clone the repository and create a new chapter under the list of RFCs based on the RFC template. Then open a PR with a subject line that starts with "RFC:".

RFC vs Implementation

The RFC can be in its own PR, or it can also includ work on the implementation together, whatever works best for you.

Does my change need an RFC?

Not all PRs require RFCs. RFCs are only needed for larger features or major changes to how Salsa works. And they don't have to be super complicated, but they should capture the most important reasons you would like to make the change. When in doubt, it's ok to just open a PR, and we can always request an RFC if we want one.

Description/title

Metadata

  • Author: (Github username(s) or real names, as you prefer)
  • Date: (today's date)
  • Introduced in: https://github.com/salsa-rs/salsa/pull/1 (please update once you open your PR)

Summary

Summarize the effects of the RFC bullet point form.

Motivation

Say something about your goals here.

User's guide

Describe effects on end users here.

Reference guide

Describe implementation details or other things here.

Alternatives and future work

Various downsides, rejected approaches, or other considerations.

Query group traits

Metadata

  • Author: nikomatsakis
  • Date: 2019-01-15
  • Introduced in: https://github.com/salsa-rs/salsa-rfcs/pull/1

Motivation

  • Support dyn QueryGroup for each query group trait as well as impl QueryGroup
    • dyn QueryGroup will be much more convenient, at the cost of runtime efficiency
  • Don't require you to redeclare each query in the final database, just the query groups

User's guide

Declaring a query group

User's will declare query groups by decorating a trait with salsa::query_group:

#[salsa::query_group(MyGroupStorage)]
trait MyGroup {
  // Inputs are annotated with `#[salsa::input]`. For inputs, the final trait will include
  // a `set_my_input(&mut self, key: K1, value: V1)` method automatically added,
  // as well as possibly other mutation methods.
  #[salsa::input]
  fn my_input(&self, key: K1) -> V1;
  
  // "Derived" queries are just a getter.
  fn my_query(&self, key: K2) -> V2;
}

The query_group attribute is a procedural macro. It takes as argument the name of the storage struct for the query group -- this is a struct, generated by the macro, which represents the query group as a whole. It is attached to a trait definition which defines the individual queries in the query group.

The macro generates three things that users interact with:

  • the trait, here named MyGroup. This will be used when writing the definitions for the queries and other code that invokes them.
  • the storage struct, here named MyGroupStorage. This will be used later when constructing the final database.
  • query structs, named after each query but converted to camel-case and with the word query (e.g., MyInputQuery for my_input). These types are rarely needed, but are presently useful for things like invoking the GC. These types violate our rule that "things the user needs to name should be given names by the user", but we choose not to fully resolve this question in this RFC.

In addition, the macro generates a number of structs that users should not have to be aware of. These are described in the "reference guide" section.

Controlling query modes

Input queries, as described in the trait, are specified via the #[salsa::input] attribute.

Derived queries can be customized by the following attributes, attached to the getter method (e.g., fn my_query(..)):

  • #[salsa::invoke(foo::bar)] specifies the path to the function to invoke when the query is called (default is my_query).
  • #[salsa::volatile] specifies a "volatile" query, which is assumed to read untracked input and hence must be re-executed on every revision.
  • #[salsa::dependencies] specifies a "dependencies-only" query, which is assumed to read untracked input and hence must be re-executed on every revision.

Creating the database

Creating a salsa database works by using a #[salsa::database(..)] attribute. The .. content should be a list of paths leading to the storage structs for each query group that the database will implement. It is no longer necessary to list the individual queries. In addition to the salsa::database query, the struct must have access to a salsa::Runtime and implement the salsa::Database trait. Hence the complete declaration looks roughly like so:

#[salsa::database(MyGroupStorage)]
struct MyDatabase {
  runtime: salsa::Runtime<MyDatabase>,
}

impl salsa::Database for MyDatabase {
  fn salsa_runtime(&self) -> salsa::Runtime<MyDatabase> {
    &self.runtime
  }
}  

This (procedural) macro generates various impls and types that cause MyDatabase to implement all the traits for the query groups it supports, and which customize the storage in the runtime to have all the data needed. Users should not have to interact with these details, and they are written out in the reference guide section.

Reference guide

The goal here is not to give the full details of how to do the lowering, but to describe the key concepts. Throughout the text, we will refer to names (e.g., MyGroup or MyGroupStorage) that appear in the example from the User's Guide -- this indicates that we use whatever name the user provided.

The plumbing::QueryGroup trait

The QueryGroup trait is a new trait added to the plumbing module. It is implemented by the query group storage struct MyGroupStorage. Its role is to link from that struct to the various bits of data that the salsa runtime needs:

pub trait QueryGroup<DB: Database> {
    type GroupStorage;
    type GroupKey;
}

This trait is implemented by the storage struct (MyGroupStorage) in our example. You can see there is a bit of confusing nameing going on here -- what we call (for user's) the "storage struct" actually does not wind up containing the true storage (that is, the hasmaps and things salsa uses). Instead, it merely implements the QueryGroup trait, which has associated types that lead us to structs we need:

  • the group storage contains the hashmaps and things for all the queries in the group
  • the group key is an enum with variants for each of the queries. It basically stores all the data needed to identify some particular query value from within the group -- that is, the name of the query, plus the keys used to invoke it.

As described further on, the #[salsa::query_group] macro is responsible will generate an impl of this trait for the MyGroupStorage struct, along with the group storage and group key type definitions.

The plumbing::HasQueryGroup<G> trait

The HasQueryGroup<G> struct a new trait added to the plumbing module. It is implemented by the database struct MyDatabase for every query group that MyDatabase supports. Its role is to offer methods that move back and forth between the context of the full database to the context of an individual query group:

pub trait HasQueryGroup<G>: Database
where
    G: QueryGroup<Self>,
{
    /// Access the group storage struct from the database.
    fn group_storage(db: &Self) -> &G::GroupStorage;

    /// "Upcast" a group key into a database key.
    fn database_key(group_key: G::GroupKey) -> Self::DatabaseKey;
}

Here the "database key" is an enum that contains variants for each group. Its role is to take group key and puts it into the context of the entire database.

The Query trait

The query trait (pre-existing) is extended to include links to its group, and methods to convert from the group storage to the query storage, plus methods to convert from a query key up to the group key:

pub trait Query<DB: Database>: Debug + Default + Sized + 'static {
    /// Type that you you give as a parameter -- for queries with zero
    /// or more than one input, this will be a tuple.
    type Key: Clone + Debug + Hash + Eq;

    /// What value does the query return?
    type Value: Clone + Debug;

    /// Internal struct storing the values for the query.
    type Storage: plumbing::QueryStorageOps<DB, Self> + Send + Sync;

    /// Associate query group struct.
    type Group: plumbing::QueryGroup<
        DB,
        GroupStorage = Self::GroupStorage,
        GroupKey = Self::GroupKey,
    >;

    /// Generated struct that contains storage for all queries in a group.
    type GroupStorage;

    /// Type that identifies a particular query within the group + its key.
    type GroupKey;

    /// Extact storage for this query from the storage for its group.
    fn query_storage(group_storage: &Self::GroupStorage) -> &Self::Storage;

    /// Create group key for this query.
    fn group_key(key: Self::Key) -> Self::GroupKey;
}

Converting to/from the context of the full database generically

Putting all the previous plumbing traits together, this means that given:

  • a database DB that implements HasGroupStorage<G>;
  • a group struct G that implements QueryGroup<DB>; and,
  • and a query struct Q that implements Query<DB, Group = G>

we can (generically) get the storage for the individual query Q out from the database db via a two-step process:

let group_storage = HasGroupStorage::group_storage(db);
let query_storage = Query::query_storage(group_storage);

Similarly, we can convert from the key to an individual query up to the "database key" in a two-step process:

let group_key = Query::group_key(key);
let db_key = HasGroupStorage::database_key(group_key);

Lowering query groups

The role of the #[salsa::query_group(MyGroupStorage)] trait MyGroup { .. } macro is primarily to generate the group storage struct and the impl of QueryGroup. That involves generating the following things:

  • the query trait MyGroup itself, but with:
    • salsa::foo attributes stripped
    • #[salsa::input] methods expanded to include setters:
      • fn set_my_input(&mut self, key: K1, value__: V1);
      • fn set_constant_my_input(&mut self, key: K1, value__: V1);
  • the query group storage struct MyGroupStorage
    • We also generate an impl of QueryGroup<DB> for MyGroupStorage, linking to the internal strorage struct and group key enum
  • the individual query types
    • Ideally, we would use Rust hygiene to hide these struct, but as that is not currently possible they are given names based on the queries, but converted to camel-case (e.g., MyInputQuery and MyQueryQuery).
    • They implement the salsa::Query trait.
  • the internal group storage struct
    • Ideally, we would use Rust hygiene to hide this struct, but as that is not currently possible it is entitled MyGroupGroupStorage<DB>. Note that it is generic with respect to the database DB. This is because the actual query storage requires sometimes storing database key's and hence we need to know the final database type.
    • It contains one field per query with a link to the storage information for that query:
      • my_query: <MyQueryQuery as salsa::plumbing::Query<DB>>::Storage
      • (the MyQueryQuery type is also generated, see the "individual query types" below)
    • The internal group storage struct offers a public, inherent method for_each_query:
      • fn for_each_query(db: &DB, op: &mut dyn FnMut(...)
      • this is invoked by the code geneated by #[salsa::database] when implementing the for_each_query method of the plumbing::DatabaseOps trait
  • the group key
    • Again, ideally we would use hygiene to hide the name of this struct, but since we cannot, it is entitled MyGroupGroupKey
    • It is an enum which contains one variant per query with the value being the key:
      • my_query(<MyQueryQuery as salsa::plumbing::Query<DB>>::Key)
    • The group key enum offers a public, inherent method maybe_changed_since:
      • fn maybe_changed_since<DB>(db: &DB, db_descriptor: &DB::DatabaseKey, revision: Revision)
      • it is invoked when implementing maybe_changed_since for the database key

Lowering database storage

The #[salsa::database(MyGroup)] attribute macro creates the links to the query groups. It generates the following things:

  • impl of HasQueryGroup<MyGroup> for MyDatabase
    • Naturally, there is one such impl for each query group.
  • the database key enum
    • Ideally, we would use Rust hygiene to hide this enum, but currently it is called __SalsaDatabaseKey.
    • The database key is an enum with one variant per query group:
      • MyGroupStorage(<MyGroupStorage as QueryGroup<MyDatabase>>::GroupKey)
  • the database storage struct
    • Ideally, we would use Rust hygiene to hide this enum, but currently it is called __SalsaDatabaseStorage.
    • The database storage struct contains one field per query group, storing its internal storage:
      • my_group_storage: <MyGroupStorage as QueryGroup<MyDatabase>>::GroupStorage
  • impl of plumbing::DatabaseStorageTypes for MyDatabase
    • This is a plumbing trait that links to the database storage / database key types.
    • The salsa::Runtime uses it to determine what data to include. The query types use it to determine a database-key.
  • impl of plumbing::DatabaseOps for MyDatabase
    • This contains a for_each_query method, which is implemented by invoking, in turn, the inherent methods defined on each query group storage struct.
  • impl of plumbing::DatabaseKey for the database key enum
    • This contains a method maybe_changed_since. We implement this by matching to get a particular group key, and then invoking the inherent method on the group key struct.

Alternatives

This proposal results from a fair amount of iteration. Compared to the status quo, there is one primary downside. We also explain a few things here that may not be obvious.

Why include a group storage struct?

You might wonder why we need the MyGroupStorage struct at all. It is a touch of boilerplate, but there are several advantages to it:

  • You can't attach associated types to the trait itself. This is because the "type version" of the trait (dyn MyGroup) may not be available, since not all traits are dyn-capable.
  • We try to keep to the principle that "any type that might be named externally from the macro is given its name by the user". In this case, the [salsa::database] attribute needed to name group storage structs.
    • In earlier versions, we tried to auto-generate these names, but this failed because sometimes users would want to pub use the query traits and hide their original paths.
    • (One exception to this principle today are the per-query structs.)
  • We expect that we can use the MyGroupStorage to achieve more encapsulation in the future. While the struct must be public and named from the database, the trait (and query key/value types) actually does not have to be.

Downside: Size of a database key

Database keys now wind up with two discriminants: one to identify the group, and one to identify the query. That's a bit sad. This could be overcome by using unsafe code: the idea would be that a group/database key would be stored as the pair of an integer and a union. Each group within a given database would be assigned a range of integer values, and the unions would store the actual key values. We leave such a change for future work.

Future possibilities

Here are some ideas we might want to do later.

No generics

We leave generic parameters on the query group trait etc for future work.

Public / private

We'd like the ability to make more details from the query groups private. This will require some tinkering.

Inline query definitions

Instead of defining queries in separate functions, it might be nice to have the option of defining query methods in the trait itself:

#[salsa::query_group(MyGroupStorage)]
trait MyGroup {
  #[salsa::input]
  fn my_input(&self, key: K1) -> V1;
  
  fn my_query(&self, key: K2) -> V2 {
      // define my-query right here!
  }
}

It's a bit tricky to figure out how to handle this, so that is left for future work. Also, it would mean that the method body itself is inside of a macro (the procedural macro) which can make IDE integration harder.

Non-query functions

It might be nice to be able to include functions in the trait that are not queries, but rather helpers that compose queries. This should be pretty easy, just need a suitable #[salsa] attribute.

Summary

  • We introduce #[salsa::interned] queries which convert a Key type into a numeric index of type Value, where Value is either the type InternId (defined by a salsa) or some newtype thereof.
  • Each interned query foo also produces an inverse lookup_foo method that converts back from the Value to the Key that was interned.
  • The InternId type (defined by salsa) is basically a newtype'd integer, but it internally uses NonZeroU32 to enable space-saving optimizations in memory layout.
  • The Value types can be any type that implements the salsa::InternIndex trait, also introduced by this RFC. This trait has two methods, from_intern_id and as_intern_id.
  • The interning is integrated into the GC and tracked like any other query, which means that interned values can be garbage-collected, and any computation that was dependent on them will be collected.

Motivation

The need for interning

Many salsa applications wind up needing the ability to construct "interned keys". Frequently this pattern emerges because we wish to construct identifiers for things in the input. These identifiers generally have a "tree-like shape". For example, in a compiler, there may be some set of input files -- these are enumerated in the inputs and serve as the "base" for a path that leads to items in the user's input. But within an input file, there are additional structures, such as struct or impl declarations, and these structures may contain further structures within them (such as fields or methods). This gives rise to a path like so that can be used to identify a given item:

PathData = <file-name>
         | PathData / <identifier>

These paths could be represented in the compiler with an Arc, but because they are omnipresent, it is convenient to intern them instead and use an integer. Integers are Copy types, which is convenient, and they are also small (32 bits typically suffices in practice).

Why interning is difficult today: garbage collection

Unfortunately, integrating interning into salsa at present presents some hard choices, particularly with a long-lived application. You can easily add an interning table into the database, but unless you do something clever, it will simply grow and grow forever. But as the user edits their programs, some paths that used to exist will no longer be relevant -- for example, a given file or impl may be removed, invalidating all those paths that were based on it.

Due to the nature of salsa's recomputation model, it is not easy to detect when paths that used to exist in a prior revision are no longer relevant in the next revision. This is because salsa never explicitly computes "diffs" of this kind between revisions -- it just finds subcomputations that might have gone differently and re-executes them. Therefore, if the code that created the paths (e.g., that processed the result of the parser) is part of a salsa query, it will simply not re-create the invalidated paths -- there is no explicit "deletion" point.

In fact, the same is true of all of salsa's memoized query values. We may find that in a new revision, some memoized query values are no longer relevant. For example, in revision R1, perhaps we computed foo(22) and foo(44), but in the new input, we now only need to compute foo(22). The foo(44) value is still memoized, we just never asked for its value. This is why salsa includes a garbage collector, which can be used to cleanup these memoized values that are no longer relevant.

But using a garbage collection strategy with a hand-rolled interning scheme is not easy. You could trace through all the values in salsa's memoization tables to implement a kind of mark-and-sweep scheme, but that would require for salsa to add such a mechanism. It might also be quite a lot of tracing! The current salsa GC mechanism has no need to walk through the values themselves in a memoization table, it only examines the keys and the metadata (unless we are freeing a value, of course).

How this RFC changes the situation

This RFC presents an alternative. The idea is to move the interning into salsa itself by creating special "interning queries". Dependencies on these queries are tracked like any other query and hence they integrate naturally with salsa's garbage collection mechanisms.

User's guide

This section covers how interned queries are expected to be used.

Declaring an interned query

You can declare an interned query like so:

#[salsa::query_group]
trait Foo {
  #[salsa::interned]
  fn intern_path_data(&self, data: PathData) -> salsa::InternId;
]

Query keys. Like any query, these queries can take any number of keys. If multiple keys are provided, then the interned key is a tuple of each key value. In order to be interned, the keys must implement Clone, Hash and Eq.

Return type. The return type of an interned key may be of any type that implements salsa::InternIndex: salsa provides an impl for the type salsa::InternId, but you can implement it for your own.

Inverse query. For each interning query, we automatically generate a reverse query that will invert the interning step. It is named lookup_XXX, where XXX is the name of the query. Hence here it would be fn lookup_intern_path(&self, key: salsa::InternId) -> Path.

The expected us

Using an interned query is quite straightforward. You simply invoke it with a key, and you will get back an integer, and you can use the generated lookup method to convert back to the original value:

let key = db.intern_path(path_data1);
let path_data2 = db.lookup_intern_path_data(key);

Note that the interned value will be cloned -- so, like all Salsa values, it is best if that is a cheap operation. Interestingly, interning can help to keep recursive, tree-shapes values cheap, because the "pointers" within can be replaced with interned keys.

Custom return types

The return type for an intern query does not have to be a InternId. It can be any type that implements the salsa::InternKey trait:

pub trait InternKey {
    /// Create an instance of the intern-key from a `InternId` value.
    fn from_intern_id(v: InternId) -> Self;

    /// Extract the `InternId` with which the intern-key was created.
    fn as_intern_id(&self) -> InternId;
}

Recommended practice

This section shows the recommended practice for using interned keys, building on the Path and PathData example that we've been working with.

Naming Convention

First, note the recommended naming convention: the intern key is Foo and the key's associated data FooData (in our case, Path and PathData). The intern key is given the shorter name because it is used far more often. Moreover, other types should never store the full data, but rather should store the interned key.

Defining the intern key

The intern key should always be a newtype struct that implements the InternKey trait. So, something like this:

pub struct Path(InternId);

impl salsa::InternKey for Path {
    fn from_intern_id(v: InternId) -> Self {
        Path(v)
    }

    fn as_intern_id(&self) -> InternId {
        self.0
    }
}

Convenient lookup method

It is often convenient to add a lookup method to the newtype key:

impl Path {
    // Adding this method is often convenient, since you can then
    // write `path.lookup(db)` to access the data, which reads a bit better.
    pub fn lookup(&self, db: &impl MyDatabase) -> PathData {
        db.lookup_intern_path_data(*self)
    }
}

Defining the data type

Recall that our paths were defined by a recursive grammar like so:

PathData = <file-name>
         | PathData / <identifier>

This recursion is quite typical of salsa applications. The recommended way to encode it in the PathData structure itself is to build on other intern keys, like so:

#[derive(Clone, Hash, Eq, ..)]
enum PathData {
  Root(String),
  Child(Path, String),
  //    ^^^^ Note that the recursive reference here
  //         is encoded as a Path.
}

Note though that the PathData type will be cloned whenever the value for an interned key is looked up, and it may also be cloned to store dependency information between queries. So, as an optimization, you might prefer to avoid String in favor of Arc<String> -- or even intern the strings as well.

Interaction with the garbage collector

Interned keys can be garbage collected as normal, with one caveat. Even if requested, Salsa will never collect the results generated in the current revision. This is because it would permit the same key to be interned twice in the same revision, possibly mapping to distinct intern keys each time.

Note that if an interned key is collected, its index will be re-used. Salsa's dependency tracking system should ensure that anything incorporating the older value is considered dirty, but you may see the same index showing up more than once in the logs.

Reference guide

Interned keys are implemented using a hash-map that maps from the interned data to its index, as well as a vector containing (for each index) various bits of data. In addition to the interned data, we must track the revision in which the value was interned and the revision in which it was last accessed, to help manage the interaction with the GC. Finally, we have to track some sort of free list that tracks the keys that are being re-used. The current implementation never actually shrinks the vectors and maps from their maximum size, but this might be a useful thing to be able to do (this is effectively a memory allocator, so standard allocation strategies could be used here).

InternId

Presently the InternId type is implemented to wrap a NonZeroU32:

pub struct InternId {
    value: NonZeroU32,
}

This means that Option<InternId> (or Option<Path>, continuing our example from before) will only be a single word. To accommodate this, the InternId constructors require that the value is less than InternId::MAX; the value is deliberately set low (currently to 0xFFFF_FF00) to allow for more sentinel values in the future (Rust doesn't presently expose the capability of having sentinel values other than zero on stable, but it is possible on nightly).

Alternatives and future work

None at present.

Summary

Allow to specify a dependency on a query group without making it a super trait.

Motivation

Currently, there's only one way to express that queries from group A can use another group B: namely, B can be a super-trait of A:

#[salsa::query_group(AStorage)]
trait A: B {

}

This approach works and allows one to express complex dependencies. However, this approach falls down when one wants to make a dependency a private implementation detail: Clients with db: &impl A can freely call B methods on the db.

This is a bad situation from software engineering point of view: if everything is accessible, it's hard to make distinction between public API and private implementation details. In the context of salsa the situation is even worse, because it breaks "firewall" pattern. It's customary to wrap low-level frequently-changing or volatile queries into higher-level queries which produce stable results and contain invalidation. In the current salsa, however, it's very easy to accidentally call a low-level volatile query instead of a wrapper, introducing and undesired dependency.

User's guide

To specify query dependencies, a requires attribute should be used:

#[salsa::query_group(SymbolsDatabaseStorage)]
#[salsa::requires(SyntaxDatabase)]
#[salsa::requires(EnvDatabase)]
pub trait SymbolsDatabase {
    fn get_symbol_by_name(&self, name: String) -> Symbol;
}

The argument of requires is a path to a trait. The traits from all requires attributes are available when implementing the query:

fn get_symbol_by_name(
    db: &(impl SymbolsDatabase + SyntaxDatabase + EnvDatabase),
    name: String,
) -> Symbol {
    // ...
}

However, these traits are not available without explicit bounds:

fn fuzzy_find_symbol(db: &impl SymbolsDatabase, name: String) {
    // Can't accidentally call methods of the `SyntaxDatabase`
}

Note that, while the RFC does not propose to add per-query dependencies, query implementation can voluntarily specify only a subset of traits from requires attribute:

fn get_symbol_by_name(
    // Purposefully don't depend on EnvDatabase
    db: &(impl SymbolsDatabase + SyntaxDatabase),
    name: String,
) -> Symbol {
    // ...
}

Reference guide

The implementation is straightforward and consists of adding traits from requires attributes to various where bounds. For example, we would generate the following blanket for above example:

impl<T> SymbolsDatabase for T
where
    T: SyntaxDatabase + EnvDatabase,
    T: salsa::plumbing::HasQueryGroup<SymbolsDatabaseStorage>
{
    ...
}

Alternatives and future work

The semantics of requires closely resembles where, so we could imagine a syntax based on magical where clauses:

#[salsa::query_group(SymbolsDatabaseStorage)]
pub trait SymbolsDatabase
    where ???: SyntaxDatabase + EnvDatabase
{
    fn get_symbol_by_name(&self, name: String) -> Symbol;
}

However, it's not obvious what should stand for ???. Self won't be ideal, because supertraits are a sugar for bounds on Self, and we deliberately want different semantics. Perhaps picking a magical identifier like DB would work though?

One potential future development here is per-query-function bounds, but they can already be simulated by voluntarily requiring less bounds in the implementation function.

Another direction for future work is privacy: because traits from requires clause are not a part of public interface, in theory it should be possible to restrict their visibility. In practice, this still hits public-in-private lint, at least with a trivial implementation.

Summary

Add Least Recently Used values eviction as a supplement to garbage collection.

Motivation

Currently, the single mechanism for controlling memory usage in salsa is garbage collection. Experience with rust-analyzer shown that it is insufficient for two reasons:

  • It's hard to determine which values should be collected. Current implementation in rust-analyzer just periodically clears all values of specific queries.

  • GC is in generally run in-between revision. However, especially after just opening the project, the number of values within a single revision can be high. In other words, GC doesn't really help with keeping peak memory usage under control. While it is possible to run GC concurrently with calculations (and this is in fact what rust-analyzer is doing right now to try to keep high water mark of memory lower), this is highly unreliable an inefficient.

The mechanism of LRU targets both of these weaknesses:

  • LRU tracks which values are accessed, and uses this information to determine which values are actually unused.

  • LRU has a fixed cap on the maximal number of entries, thus bounding the memory usage.

User's guide

It is possible to call set_lru_capacity(n) method on any non-input query. The effect of this is that the table for the query stores at most n values in the database. If a new value is computed, and there are already n existing ones in the database, the least recently used one is evicted. Note that information about query dependencies is not evicted. It is possible to change lru capacity at runtime at any time. n == 0 is a special case, which completely disables LRU logic. LRU is not enabled by default.

Reference guide

Implementation wise, we store a linked hash map of keys, in the recently-used order. Because reads of the queries are considered uses, we now need to write-lock the query map even if the query is fresh. However we don't do this bookkeeping if LRU is disabled, so you don't have to pay for it unless you use it.

A slight complication arises with volatile queries (and, in general, with any query with an untracked input). Similarly to GC, evicting such a query could lead to an inconsistent database. For this reason, volatile queries are never evicted.

Alternatives and future work

LRU is a compromise, as it is prone to both accidentally evicting useful queries and needlessly holding onto useless ones. In particular, in the steady state and without additional GC, memory usage will be proportional to the lru capacity: it is not only an upper bound, but a lower bound as well!

In theory, some deterministic way of evicting values when you for sure don't need them anymore maybe more efficient. However, it is unclear how exactly that would work! Experiments in rust-analyzer show that it's not easy to tame a dynamic crate graph, and that simplistic phase-based strategies fall down.

It's also worth noting that, unlike GC, LRU can in theory be more memory efficient than deterministic memory management. Unlike a traditional GC, we can safely evict "live" objects and recalculate them later. That makes possible to use LRU for problems whose working set of "live" queries is larger than the available memory, at the cost of guaranteed recomputations.

Currently, eviction is strictly LRU base. It should be possible to be smarter and to take size of values and time that is required to recompute them into account when making decisions about eviction.

Summary

  • Introduce a user-visibile concept of Durability
  • Adjusting the "durability" of an input can allow salsa to skip a lot of validation work
  • Garbage collection -- particularly of interned values -- however becomes more complex
  • Possible future expansion: automatic detection of more "durable" input values

Motivation

Making validation faster by optimizing for "durability"

Presently, salsa's validation logic requires traversing all dependencies to check that they have not changed. This can sometimes be quite costly in practice: rust-analyzer for example sometimes spends as much as 90ms revalidating the results from a no-op change. One option to improve this is simply optimization -- salsa#176 for example reduces validation times significantly, and there remains opportunity to do better still. However, even if we are able to traverse the dependency graph more efficiently, it will still be an O(n) process. It would be nice if we could do better.

One observation is that, in practice, there are often input values that are known to change quite infrequently. For example, in rust-analyzer, the standard library and crates downloaded from crates.io are unlikely to change (though changes are possible; see below). Similarly, the Cargo.toml file for a project changes relatively infrequently compared to the sources. We say then that these inputs are more durable -- that is, they change less frequently.

This RFC proposes a mechanism to take advantage of durability for optimization purposes. Imagine that we have some query Q that depends solely on the standard library. The idea is that we can track the last revision R when the standard library was changed. Then, when traversing dependencies, we can skip traversing the dependencies of Q if it was last validated after the revision R. Put another way, we only need to traverse the dependencies of Q when the standard library changes -- which is unusual. If the standard library does change, for example by user's tinkering with the internal sources, then yes we walk the dependencies of Q to see if it is affected.

User's guide

The durability type

We add a new type salsa::Durability which has there associated constants:

#[derive(Copy, Clone, Debug, Ord)]
pub struct Durability(..);

impl Durability {
  // Values that change regularly, like the source to the current crate.
  pub const LOW: Durability;
  
  // Values that change infrequently, like Cargo.toml.
  pub const MEDIUM: Durability;

  // Values that are not expected to change, like sources from crates.io or the stdlib.
  pub const HIGH: Durability;
}

h## Specifying the durability of an input

When setting an input foo, one can now invoke a method set_foo_with_durability, which takes a Durability as the final argument:

// db.set_foo(key, value) is equivalent to:
db.set_foo_with_durability(key, value, Durability::LOW);

// This would indicate that `foo` is not expected to change: 
db.set_foo_with_durability(key, value, Durability::HIGH);

Durability of interned values

Interned values are always considered Durability::HIGH. This makes sense as many queries that only use high durability inputs will also make use of interning internally. A consequence of this is that they will not be garbage collected unless you use the specific patterns recommended below.

Synthetic writes

Finally, we add one new method, synthetic_write(durability), available on the salsa runtime:

db.salsa_runtime().synthetic_write(Durability::HIGH)

As the name suggests, synthetic_write causes salsa to act as though a write to an input of the given durability had taken place. This can be used for benchmarking, but it's also important to controlling what values get garbaged collected, as described below.

Tracing and garbage collection

Durability affects garbage collection. The SweepStrategy struct is modified as follows:

/// Sweeps values which may be outdated, but which have not
/// been verified since the start of the current collection.
/// These are typically memoized values from previous computations
/// that are no longer relevant.
pub fn sweep_outdated(self) -> SweepStrategy;

/// Sweeps values which have not been verified since the start 
/// of the current collection, even if they are known to be 
/// up to date. This can be used to collect "high durability" values
/// that are not *directly* used by the main query.
///
/// So, for example, imagine a main query `result` which relies
/// on another query `threshold` and (indirectly) on a `threshold_inner`:
///
/// ```
/// result(10) [durability: Low]
///    |
///    v
/// threshold(10) [durability: High]
///    |
///    v
/// threshold_inner(10)  [durability: High]
/// ```
///
/// If you modify a low durability input and then access `result`,
/// then `result(10)` and its *immediate* dependencies will 
/// be considered "verified". However, because `threshold(10)` 
/// has high durability and no high durability input was modified,
/// we will not verify *its* dependencies, so `threshold_inner` is not 
/// verified (but it is also not outdated).
///
/// Collecting unverified things would therefore collect `threshold_inner(10)`.
/// Collecting only *outdated* things (i.e., with `sweep_outdated`)
/// would collect nothing -- but this does mean that some high durability
/// queries that are no longer relevant to your main query may stick around.
/// 
/// To get the most precise garbage collection, do a synthetic write with
/// high durability -- this will force us to verify *all* values. You can then
/// sweep unverified values.
pub fn sweep_unverified(self) -> SweepStrategy;

Reference guide

Review: The need for GC to collect outdated values

In general, salsa's lazy validation scheme can lead to the accumulation of garbage that is no longer needed. Consider a query like this one:

fn derived1(db: &impl Database, start: usize) {
  let middle = self.input(start);
  self.derived2(middle)
}

Now imagine that, on some particular run, we compute derived1(22):

  • derived1(22)
    • executes input(22), which returns 44
    • then executes derived2(44)

The end result of this execution will be a dependency graph like:

derived1(22) -> derived2(44)
  |
  v
input(22)

Now. imagine that the user modifies input(22) to have the value 45. The next time derived1(22) executes, it will load input(22) as before, but then execute derived2(45). This leaves us with a dependency graph as follows:

derived1(22) -> derived2(45)
  |
  v
input(22)       derived2(44)

Notice that we still see derived2(44) in the graph. This is because we memoized the result in last round and then simply had no use for it in this round. The role of GC is to collect "outdated" values like this one.

###Review: Tracing and GC before durability

In the absence of durability, when you execute a query Q in some new revision where Q has not previously executed, salsa must trace back through all the queries that Q depends on to ensure that they are still up to date. As each of Q's dependencies is validated, we mark it to indicate that it has been checked in the current revision (and thus, within a particular revision, we would never validate or trace a particular query twice).

So, to continue our example, when we first executed derived1(22) in revision R1, we might have had a graph like:

derived1(22)   -> derived2(44)
[verified: R1]    [verified: R1]
  |
  v
input(22)

Now, after we modify input(22) and execute derived1(22) again, we would have a graph like:

derived1(22)   -> derived2(45)
[verified: R2]    [verified: R2]
  |
  v
input(22)         derived2(44)
                  [verified: R1]

Note that derived2(44), the outdated value, never had its "verified" revision updated, because we never accessed it.

Salsa leverages this validation stamp to serve as the "marking" phase of a simple mark-sweep garbage collector. The idea is that the sweep method can collect any values that are "outdated" (whose "verified" revision is less than the current revision).

The intended model is that one can do a "mark-sweep" style garbage collection like so:

// Modify some input, triggering a new revision.
db.set_input(22, 45);

// The **mark** phase: execute the "main query", with the intention
// that we wish to retain all the memoized values needed to compute
// this main query, but discard anything else. For example, in an IDE
// context, this might be a "compute all errors" query.
db.derived1(22);

// The **sweep** phase: discard anything that was not traced during
// the mark phase.
db.sweep_all(...);

In the case of our example, when we execute sweep_all, it would collect derived2(44).

Challenge: Durability lets us avoid tracing

This tracing model is affected by the move to durability. Now, if some derived value has a high durability, we may skip tracing its descendants altogether. This means that they would never be "verified" -- that is, their "verified date" would never be updated.

This is why we modify the definition of "outdated" as follows:

  • For a query value Q with durability D, let R_lc be the revision when values of durability D last changed. Let R_v be the revision when Q was last verified.
  • Q is outdated if R_v < R_lc.
    • In other words, if Q may have changed since it was last verified.

Collecting interned and untracked values

Most values can be collected whenever we like without influencing correctness. However, interned values and those with untracked dependencies are an exception -- they can only be collected when outdated. This is because their values may not be reproducible -- in other words, re-executing an interning query (or one with untracked dependencies, which can read arbitrary program state) twice in a row may produce a different value. In the case of an interning query, for example, we may wind up using a different integer than we did before. If the query is outdated, this is not a problem: anything that dependend on its result must also be outdated, and hence would be re-executed and would observe the new value. But if the query is not outdated, then we could get inconsistent result.s

Alternatives and future work

Rejected: Arbitrary durabilities

We considered permitting arbitrary "levels" of durability -- for example, allowing the user to specify a number -- rather than offering just three. Ultimately it seemed like that level of control wasn't really necessary and that having just three levels would be sufficient and simpler.

Rejected: Durability lattices

We also considered permitting a "lattice" of durabilities -- e.g., to mirror the crate DAG in rust-analyzer -- but this is tricky because the lattice itself would be dependent on other inputs.

Dynamic databases

Metadata

  • Author: nikomatsakis
  • Date: 2020-06-29
  • Introduced in: salsa-rs/salsa#1 (please update once you open your PR)

Summary

  • Retool Salsa's setup so that the generated code for a query group is not dependent on the final database type, and interacts with the database only through dyn trait values.
  • This imposes a certain amount of indirecton but has the benefit that when a query group definition changes, less code must be recompiled as a result.
  • Key changes include:
    • Database keys are "interned" in the database to produce a DatabaseKeyIndex.
    • The values for cached query are stored directly in the hashtable instead of in an Arc. There is still an Arc per cached query, but it stores the dependency information.
    • The various traits are changed to make salsa::Database dyn-safe. Invoking methods on the runtime must now go through a salsa::Runtime trait.
    • The salsa::requires functionality is removed.
  • Upsides of the proposal:
    • Potentially improved recompilation time. Minimal code is regenerated.
    • Removing the DatabaseData unsafe code hack that was required by slots.
  • Downsides of the proposal:
    • The effect on runtime performance must be measured.
    • DatabaseKeyIndex values will leak, as we propose no means to reclaim them. However, the same is true of Slot values today.
    • Storing values for the tables directly in the hashtable makes it less obvious how we would return references to them in a safe fashion (before, I had planned to have a separate module that held onto the Arc for the slot, so we were sure the value would not be deallocated; one can still imagine supporting this feature, but it would require some fancier unsafe code reasoning, although it would be more efficient.)
    • The salsa::requires functionality is removed.

Motivation

Under the current salsa setup, all of the "glue code" that manages cache invalidation and other logic is ultimately parameterized by a type DB that refers to the full database. The problem is that, if you consider a typical salsa crate graph, the actual value for that type is not available until the final database crate is compiled:

graph TD;
  Database["Crate that defines the database"];
  QueryGroupA["Crate with query group A"];
  QueryGroupB["Crate with query group B"];
  SalsaCrate["the `salsa` crate"];
  Database -- depends on --> QueryGroupA;
  Database -- depends on --> QueryGroupB;
  QueryGroupA -- depends on --> SalsaCrate;
  QueryGroupB -- depends on --> SalsaCrate;

The result is that we do not actually compile a good part of the code from QueryGroupA or QueryGroupB until we build the final database crate.

What you can do today: dyn traits

What you can do today is to use define a "dyn-compatible" query group trait and then write your derived functions using a dyn type as the argument:

#[salsa::query_group(QueryGroupAStorage)]
trait QueryGroupA {
    fn derived(&self, key: usize) -> usize;
}

fn derived(db: &dyn QueryGroupA, key: usize) -> usize {
    key * 2
}

This has the benefit that the derived function is not generic. However, it's still true that the glue code salsa makes will be generic over a DB type -- this includes the impl of QueryGroupA but also the Slot and other machinery. This means that even if the only change is to query group B, in a different crate, the glue code for query group A ultimately has to be recompiled whenever the Database crate is rebuilt (though incremental compilation may help here). Moreover, as reported in salsa-rs/salsa#220, measurements of rust-analyzer suggest that this code may be duplicated and accounting for more of the binary than we would expect.

FIXME: I'd like to have better measurements on the above!

Our goal

The primary goal of this RFC is to make it so that the glue code we generate for query groups is not dependent on the database type, thus enabling better incremental rebuilds.

User's guide

Most of the changes in this RFC are "under the hood". But there are various user visibile changes proposed here.

All query groups must be dyn safe

The largest one is that all Salsa query groups must now be dyn-safe. The existing salsa query methods are all dyn-safe, so what this really implies is that one cannot have super-traits that use generic methods or other things that are not dyn safe. For example, this query group would be illegal:

#[salsa::query_group(QueryGroupAStorage)]
trait QueryGroupA: Foo {
}

trait Foo {
    fn method<T>(t: T) { }
}

We could support query groups that are not dyn safe, but it would require us to have two "similar but different" ways of generating plumbing, and I'm not convinced that it's worth it. Moreover, it would require some form of opt-in so that would be a measure of user complexity as well.

All query functions must take a dyn database

You used to be able to implement queries by using impl MyDatabase, like so:

fn my_query(db: &impl MyDatabase, ...) { .. }

but you must now use dyn MyDatabase:

fn my_query(db: &dyn MyDatabase, ...) { .. }

Databases embed a Storage<DB> with a fixed field name

The "Hello World" database becomes the following:

#[salsa::database(QueryGroup1, ..., QueryGroupN)]
struct MyDatabase {
    storage: salsa::Storage<Self>
}

impl salsa::Database for MyDatabase {}

In particular:

  • You now embed a salsa::Storage<Self> instead of a salsa::Runtime<Self>
  • The field must be named storage by default; we can include a #[salsa::storge_field(xxx)] annotation to change that default if desired.
    • Or we could scrape the struct declaration and infer it, I suppose.
  • You no longer have to define the salsa_runtime and salsa_runtime_mut methods, they move to the DatabaseOps trait and are manually implemented by doing self.storage.runtime() and so forth.

Why these changes, and what is this Storage struct? This is because the actual storage for queries is moving outside of the runtime. The Storage struct just combines the Runtime (whose type no longer references DB directly) with an Arc<DB::Storage>. The full type of Storage, since it includes the database type, cannot appear in any public interface, it is just used by the various implementations that are created by salsa::database.

Instead of db.query(Q), you write Q.in_db(&db)

As a consequence of the previous point, the existing query and query_mut methods on the salsa::Database trait are changed to methods on the query types themselves. So instead of db.query(SomeQuery), one would write SomeQuery.in_db(&db) (or in_db_mut). This both helps by making the salsa::Database trait dyn-safe and also works better with the new use of dyn types, since it permits a coercion from &db to the appropriate dyn database type at the point of call.

The salsa-event mechanism will move to dynamic dispatch

A further consequence is that the existing salsa_event method will be simplified and made suitable for dynamic dispatch. It used to take a closure that would produce the event if necessary; it now simply takes the event itself. This is partly because events themselves no longer contain complex information: they used to have database-keys, which could require expensive cloning, but they now have simple indices.

fn salsa_event(&self, event: Event) {
    #![allow(unused_variables)]
}

This may imply some runtime cost, since various parts of the machinery invoke salsa_event, and those calls will now be virtual calls. They would previously have been static calls that would likely have been optimized away entirely.

It is however possible that ThinLTO or other such optimization could remove those calls, this has not been tested, and in any case the runtime effects are not expected to be high, since all the calls will always go to the same function.

The salsa::requires function is removed

We currently offer a feature for "private" dependencies between query groups called #[salsa::requires(ExtraDatabase)]. This then requires query functions to be written like:

fn query_fn(db: &impl Database + ExtraDatabase, ...) { }

This format is not compatible with dyn, so this feature is removed.

Reference guide

Example

To explain the proposal, we'll use the Hello World example, lightly adapted:

#[salsa::query_group(HelloWorldStorage)]
trait HelloWorld: salsa::Database {
    #[salsa::input]
    fn input_string(&self, key: ()) -> Arc<String>;

    fn length(&self, key: ()) -> usize;
}

fn length(db: &dyn HelloWorld, (): ()) -> usize {
    // Read the input string:
    let input_string = db.input_string(());

    // Return its length:
    input_string.len()
}

#[salsa::database(HelloWorldStorage)]
struct DatabaseStruct {
    runtime: salsa::Runtime<DatabaseStruct>,
}

impl salsa::Database for DatabaseStruct {
    fn salsa_runtime(&self) -> &salsa::Runtime<Self> {
        &self.runtime
    }

    fn salsa_runtime_mut(&mut self) -> &mut salsa::Runtime<Self> {
        &mut self.runtime
    }
}

Identifying queries using the DatabaseKeyIndex

We introduce the following struct that represents a database key using a series of indices:

struct DatabaseKeyIndex {
    /// Identifies the query group.
    group_index: u16,

    /// Identifies the query within the group.
    query_index: u16,

    /// Identifies the key within the query.
    key_index: u32,
}

This struct allows the various query group structs to refer to database keys without having to use a type like DB::DatabaseKey that is dependent on the DB.

The group/query indices will be assigned by the salsa::database and salsa::query_group macros respectively. When query group storage is created, it will be passed in its group index by the database. Each query will be able to access its query-index through the Query trait, as they are statically known at the time that the query is compiled (the group index, in contrast, depends on the full set of groups for the database).

The key index can be assigned by the query as it executes without any central coordination. Each query will use a IndexMap (from the indexmap crate) mapping Q::Key -> QueryState. Inserting new keys into this map also creates new indices, and it is possible to index into the map in O(1) time later to obtain the state (or key) from a given query. This map replaces the existing Q::Key -> Arc<Slot<..>> map that is used today.

One notable implication: we cannot remove entries from the query index map (e.g., for GC) because that would invalidate the existing indices. We can however replace the query-state with a "not computed" value. This is not new: slots already take this approach today. In principle, we could extend the tracing GC to permit compressing and perhaps even rewriting indices, but it's not clear that this is a problem in practice.

The DatabaseKeyIndex also supports a debug method that returns a value with a human readable debug! output, so that you can do debug!("{:?}", index.debug(db)). This works by generating a fmt_debug method that is supported by the various query groups.

The various query traits are not generic over a database

Today, the Query, QueryFunction, and QueryGroup traits are generic over the database DB, which allows them to name the final database type and associated types derived from it. In the new scheme, we never want to do that, and so instead they will now have an associated type, DynDb, that maps to the dyn version of the query group trait that the query is associated with.

Therefore QueryFunction for example can become:

pub trait QueryFunction: Query {
    fn execute(db: &<Self as QueryDb<'_>>::DynDb, key: Self::Key) -> Self::Value;
    fn recover(db: &<Self as QueryDb<'_>>::DynDb, cycle: &[DB::DatabaseKey], key: &Self::Key) -> Option<Self::Value> {
        let _ = (db, cycle, key);
        None
    }
}

Storing query results and tracking dependencies

In today's setup, we have all the data for a particular query stored in a Slot<Q, DB, MP>, and these slots hold references to one another to track dependencies. Because the type of each slot is specific to the particular query Q, the references between slots are done using a Arc<dyn DatabaseSlot<DB>> handle. This requires some unsafe hacks, including the DatabaseData associated type.

This RFC proposes to alter this setup. Dependencies will store a DatabaseIndex instead. This means that validating dependencies is less efficient, as we no longer have a direct pointer to the dependency information but instead must execute three index lookups (one to find the query group, one to locate the query, and then one to locate the key). Similarly the LRU list can be reverted to a LinkedHashMap of indices.

We may tinker with other approaches too: the key change in the RFC is that we do not need to store a DB::DatabaseKey or Slot<..DB..>, but instead can use some type for dependencies that is independent of the dtabase type DB.

Dispatching methods from a DatabaseKeyIndex

There are a number of methods that can be dispatched through the database interface on a DatabaseKeyIndex. For example, we already mentioned fmt_debug, which emits a debug representation of the key, but there is also maybe_changed_since, which checks whether the value for a given key may have changed since the given revision. Each of these methods is a member of the DatabaseOps trait and they are dispatched as follows.

First, the #[salsa::database] procedural macro is the one which generates the DatabaseOps impl for the database. This base method simply matches on the group index to determine which query group contains the key, and then dispatches to an inherent method defined on the appropriate query group struct:

impl salsa::plumbing::DatabaseOps for DatabaseStruct {
    // We'll use the `fmt_debug` method as an example
    fn fmt_debug(&self, index: DatabaseKeyIndex, fmt: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match index.group_index() {
            0 => {
                let storage = <Self as HasQueryGroup<HelloWorld>>::group_storage(self);
                storage.fmt_debug(index, fmt)
            }

            _ => panic!("Invalid index")
        }
    }
}

The query group struct has a very similar inherent method that dispatches based on the query index and invokes a method on the query storage:

impl HelloWorldGroupStorage__ {
    // We'll use the `fmt_debug` method as an example
    fn fmt_debug(&self, index: DatabaseKeyIndex, fmt: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match index.query_index() {
            0 => self.appropriate_query_field.fmt_debug(index, fmt),
            1 => ...
            _ => panic!("Invalid index")
        }
    }
}

Finally, the query storage can use the key index to lookup the appropriate data from the FxIndexSet.

Wrap runtime in a Storage<DB> type

The Salsa runtime is currently Runtime<DB> but it will change to just Runtime and thus not be generic over the database. This means it can be referenced directly by query storage implementations. This is very useful because it allows that type to have a number of pub(crate) details that query storage implementations make use of but which are not exposed as part of our public API.

However, the Runtime crate used to contain a DB::Storage, and without the DB in its type, it no longer can. Therefore, we will introduce a new type Storage<DB> type which is defined like so:

pub struct Storage<DB: DatabaseImpl> {
    query_store: Arc<DB::DatabaseStorage>,
    runtime: Runtime,
}

impl<DB> Storage<DB> {
    pub fn query_store(&self) -> &DB::DatabaseStorage {
        &self.query_store
    }

    pub fn salsa_runtime(&self) -> &Runtime {
        &self.runtime
    }

    pub fn salsa_runtime_mut(&mut self) -> &mut Runtime {
        &self.runtime
    }

    /// Used for parallel queries
    pub fn snapshot(&self) -> Self {
        Storage {
            query_store: query_store.clone(),
            runtime: runtime.snapshot(),
        }
    }
}

The user is expected to include a field storage: Storage<DB> in their database definition. The salsa::database procedural macro, when it generates impls of traits like HasQueryGroup, will embed code like self.storage that looks for that field.

salsa_runtime methods move to DatabaseOps trait

The salsa_runtime methods used to be manually implemented by users to define the field that contains the salsa runtime. This was always boilerplate. The salsa::database macro now handles that job by defining them to invoke the corresponding methods on Storage.

Salsa database trait becomes dyn safe

Under this proposal, the Salsa database must be dyn safe. This implies that we have to make a few changes:

  • The query and query_mut methods move to an extension trait.
  • The DatabaseStorageTypes supertrait is removed (that trait is renamed and altered, see next section).
  • The salsa_event method changes, as described in the User's guide.

Salsa database trait requires 'static, at least for now

One downside of this proposal is that the salsa::Database trait now has a 'static bound. This is a result of the lack of GATs -- in particular, the queries expect a <Q as QueryDb<'_>>::DynDb as argument. In the query definition, we have something like type DynDb = dyn QueryGroupDatabase, which in turn defaults to dyn::QueryGroupDatabase + 'static.

At the moment, this limitation is harmless, since salsa databases don't support generic parameters. But it would be good to lift in the future, especially as we would like to support arena allocation and other such patterns. The limitation could be overcome in the future by:

  • converting to a GAT like DynDb<'a>, if those were available;
  • or by simulating GATs by introducing a trait to carry the DynDb definition, like QueryDb<'a>, where Query has the supertrait for<'a> Self: QueryDb<'a>. This would permit the DynDb type to be referenced by writing <Q as QueryDb<'a>>::DynDb.

Salsa query group traits are extended with Database and HasQueryGroup supertrait

When #[salsa::query_group] is applied to a trait, we currently generate a copy of the trait that is "more or less" unmodified (although we sometimes add additional synthesized methods, such as the set method for an input). Under this proposal, we will also introduce a HasQueryGroup<QueryGroupStorage> supertrait. Therefore the following input:

#[salsa::query_group(HelloWorldStorage)]
trait HelloWorld { .. }

will generate a trait like:

trait HelloWorld: 
    salsa::Database + 
    salsa::plumbing::HasQueryGroup<HelloWorldStorage>
{ 
    .. 
}

The Database trait is the standard salsa::Database trait and contains various helper methods. The HasQueryGroup trait is implemented by the database and defines various plumbing methods that are used by the storage implementations.

One downside of this is that salsa::Database methods become available on the trait; we might want to give internal plumbing methods more obscure names.

Bounds were already present on the blanket impl of salsa query group trait

The new bounds that are appearing on the trait were always present on the blanket impl that the salsa::query_group procedural macro generated, which looks like so (and continues unchanged under this RFC):

impl<DB> HelloWorld for DB
where
    DB: salsa::Database +
    DB: salsa::plumbing::HasQueryGroup<HelloWorldStorage>
{
    ...
}

The reason we generate the impl is so that the salsa::database procedural macro can simply create the HasQueryGroup impl and never needs to know the name of the HelloWorld trait, only the HelloWorldStorage type.

Storage types no longer parameterized by the database

Today's storage types, such as Derived, are parameterized over both a query Q and a DB (along with the memoization policy MP):

// Before this RFC:
pub struct DerivedStorage<DB, Q, MP>
where
    Q: QueryFunction<DB>,
    DB: Database + HasQueryGroup<Q::Group>,
    MP: MemoizationPolicy<DB, Q>,

The DB parameter should no longer be needed after the previously described changes are made, so that the signature looks like:

// Before this RFC:
pub struct DerivedStorage<Q, MP>
where
    Q: QueryFunction,
    MP: MemoizationPolicy<DB, Q>,

Alternatives and future work

The 'linch-pin' of this design is the DatabaseKeyIndex type, which allows for signatures to refer to "any query in the system" without reference to the DB type. The biggest downside of the system is that this type is an integer which then requires a tracing GC to recover index values. The primary alternative would be to use an Arc-like scheme,but this has some severe downsides:

  • Requires reference counting, allocation
  • Hashing and equality comparisons have more data to process versus an integer
  • Equality comparisons must still be deep since you may have older and newer keys co-existing
  • Requires a Arc<dyn DatabaseKey>-like setup, which then encounters the problems that this type is not Send or Sync, leading to hacks like the DB::DatabaseData we use today.