synwire-storage: Storage Layout and Project Identity

synwire-storage provides two things: a deterministic mapping from a product name to all storage paths a Synwire deployment needs (StorageLayout), and a two-level project identity (RepoId / WorktreeId) that is stable across clones and branches.

Why a centralised path library?

Without a shared layout library, paths scatter across crates and configuration files, making them difficult to migrate and easy to get wrong. StorageLayout is the single source of truth: every crate that needs a path calls layout.index_cache(worktree) rather than constructing paths ad-hoc.

Path hierarchy

graph TD
    DATA["$XDG_DATA_HOME/<product>/"]
    CACHE["$XDG_CACHE_HOME/<product>/"]

    DATA --> SESSIONS["sessions/<id>.db"]
    DATA --> EXPERIENCE["experience/<worktree_key>.db"]
    DATA --> SKILLS["skills/"]
    DATA --> LOGS["logs/"]
    DATA --> PID["daemon.pid"]
    DATA --> SOCK["daemon.sock"]
    DATA --> GLOBAL["global/"]
    GLOBAL --> GREG["registry.json"]
    GLOBAL --> GEXP["experience.db"]
    GLOBAL --> GDEP["dependencies.db"]
    GLOBAL --> GCFG["config.json"]

    CACHE --> INDICES["indices/<worktree_key>/"]
    CACHE --> GRAPHS["graphs/<worktree_key>/"]
    CACHE --> COMMUNITIES["communities/<worktree_key>/"]
    CACHE --> LSP["lsp/<worktree_key>/"]
    CACHE --> MODELS["models/"]
    CACHE --> REPOS["repos/<owner>/<repo>/"]

Durable vs cache

Data is separated by durability:

TierLocationPropertyExamples
Durable$DATA/<product>/Must survive rebootsSessions, experience, skills, logs
Cache$CACHE/<product>/Safe to delete and regenerateIndices, graphs, communities, cloned repos

This separation aligns with XDG semantics. Cache directories can be cleaned by systemd-tmpfiles, bleachbit, or manual deletion without losing durable state.

Configuration hierarchy

StorageLayout::new(product_name) resolves paths in this priority order:

  1. SYNWIRE_DATA_DIR / SYNWIRE_CACHE_DIR environment variables
  2. Programmatic override via StorageLayout::with_root(root, name)
  3. .<product>/config.json in the project root
  4. Platform default (directories::BaseDirs)

This allows per-project, per-machine, and per-container overrides without changing code.

Two-level identity

The problem

A developer may have multiple working copies of the same repository:

  • ~/projects/myapp — main branch
  • ~/projects/myapp-feature — feature branch (git worktree)
  • ~/builds/myapp — CI checkout

These should share certain data (experience pool, global dependency graph) but have independent indices (because the code differs per branch).

Solution: RepoId + WorktreeId

graph TD
    A["git rev-list --max-parents=0 HEAD"] --> R["RepoId (first-commit SHA-1)"]
    B["canonical worktree root path"] --> W["WorktreeId (RepoId + SHA-256 of path)"]
    R --> W

RepoId is the same for all clones and worktrees of one repository. It is derived from the SHA-1 of the root (first) commit. When Git is unavailable, SHA-256 of the canonical path is used as a fallback.

WorktreeId uniquely identifies a single working copy. It combines RepoId with a SHA-256 of the canonicalised worktree root path. The key() method returns a compact filesystem-safe string: <repo_id>-<worktree_hash[:12]>.

What each tier uses

DataKeyed byWhy
Vector + BM25 indicesWorktreeIdCode differs per branch
Code dependency graphWorktreeIdCall graph is branch-specific
Experience poolWorktreeIdEdit history is branch-specific
Global dependency indexSpans all repos
Global experienceSpans all repos

Path conventions

All path strings produced by StorageLayout use the WorktreeId.key() format as directory names under the cache root. This means:

  • Directory names are stable across renames of the project directory (key is based on git content, not path)
  • Two machines cloning the same repo get the same RepoId and therefore compatible cache structures
  • Branch switches produce different WorktreeIds, so index data from main is not mixed with feature-branch

Product isolation

Each product name produces fully isolated paths. Two products running on the same machine cannot see each other's data:

#![allow(unused)]
fn main() {
let a = StorageLayout::new("product-a")?;
let b = StorageLayout::new("product-b")?;
assert_ne!(a.data_home(), b.data_home());
assert_ne!(a.cache_home(), b.cache_home());
}

This enables multiple Synwire-based tools (e.g. a coding assistant and a documentation assistant) to coexist without conflicting storage.

Directory permissions

StorageLayout::ensure_dir(path) creates directories with 0o700 permissions on Unix. This prevents other users on the same machine from reading session data, experience pools, or API-adjacent artefacts.

See also