# 10 · Security and secrets > **Audience.** Every engineer. Security is everyone's job; this page > codifies how we practise it. > ⟵ [internal index](./README.md) ## 01 · Threat model The threats we care about, in priority order: 1. **Supply-chain compromise** of a weight bundle we ship to a customer (hostile code running in the customer's rack). 2. **Signing-key theft** enabling the above. 3. **Customer-data exfiltration** during an engagement. 4. **Runtime vulnerabilities** in our platform shipping to production. 5. **Insider threat** — an engineer turns hostile. 6. **Ransom / extortion** against our build infrastructure. We do not centrally worry about DDoS of `bilbs.ai` (Cloudflare handles it) or classical phishing (every account is YubiKey-gated). ## 02 · Signing keys The most valuable secret we own. If a bad actor gets the signing key, they can push a tampered bundle that customer updaters will accept. ### Hierarchy - **Root key** (offline). Ceremonially generated. Used only to sign intermediate keys. Stored as **Shamir 3-of-5** on: - 2× Tangem hardware wallets (separate jurisdictions). - 3× printed paper shards in sealed envelopes at different physical sites (including legal counsel). - **Intermediate signing keys**, one per year (e.g., `prod/2026`). Live in Hashicorp Vault PKI; hot in CI via short-lived OIDC. - **Ephemeral per-release keys**, minted by Vault for each CI run, TTL 30 min. Every layer is logged in our Rekor mirror at `log.bilbs.ai`. ### Rotation - Intermediate: yearly, with a 90-day overlap so bundles signed late in year N still verify after year N+1 starts. - Root: only on compromise or every 5 years. Rotation runbook: `ops/sops/rotate-signing-key.md` (internal). ### Verification on the customer side - Updater ships with our Sigstore root pinned in the binary. - On every artefact pull, signature is verified offline. - Tampered bundles never reach disk. ## 03 · Secrets management **Hashicorp Vault** is the single source of truth. Dev secrets in 1Password as a convenience layer for humans. ### Secrets tiering | Tier | Where | Rotation | Who accesses | |------|-------|---------:|--------------| | Human-interactive dev | 1Password `Bilbs — Engineering` | 90 d | all engineers | | CI / runtime | Vault `kv/ci/` and `kv/runtime/` | per-release (short-lived) | CI + platform services | | Customer-specific | Vault `kv/clients//` | per-engagement | engineers on that engagement only | | Signing | Vault PKI `pki/signing/` | yearly | automation only, human access requires 2-person | | Root | offline Shamir | 5-year | ceremony only | ### Vault policies Per-engineer policy files in `deploy/vault/policies/`. Reviewed quarterly. Dead-man: if an engineer doesn't authenticate for 90 days, their policy expires; they re-onboard via the security process below. ### Local `.env` policy - `.env` in any repo is gitignored + blocked by a pre-commit hook (`ripsecrets`). - Template is `.env.example`, committed. - `direnv` loads `.env` into shells per project. - We never paste secrets into Slack, even in private channels. ## 04 · SBOM and supply chain - Every OCI image ships with an SBOM (`syft`-generated, CycloneDX). - Every binary release gets an SBOM attachment on GitHub Releases. - `grype` runs on every PR and nightly on `main`. Any **high** or **critical** CVE in a dependency blocks merge unless suppressed with a documented reason and a deadline. - `deps-update` runs a weekly automated PR rolling minor + patch versions. - Yearly, a human reviews every direct dependency for ownership / maintenance status (the "are they still here?" audit). SLSA compliance: - Platform releases aim for **SLSA Level 3**. - Customer weight bundles that include customer data aim for Level 3 with a documented caveat about reproducibility (non-deterministic NCCL). - Level 4 is aspirational; we don't claim it. ## 05 · The two-person rule Required for: - Signing a root-level release. - Rotating a signing key. - Accessing a customer's production Box in a non-Support tunnel. - Deploying to `registry.bilbs.ai` or `updates.bilbs.ai` from anywhere other than CI. - Exporting a customer's fine-tune data. Enforced via Vault policies that require a second-approver on the relevant paths. ## 06 · Access to customer infrastructure ### Build phase - Read access to a customer's corpus via a **bastion host they control**. We do not replicate their corpus to our laptops or our lab box in bulk. - Writes to their cloud account via **time-bound IAM role** we assume with their approval. - Never: persistent service-principals with long-lived keys. ### Production (post-handover) - Box has **no default outbound** to us. - WireGuard support tunnel is off unless: - Customer is on the Support plan AND - Admin clicks "Open tunnel" in their UI (explicit opt-in) AND - Tunnel auto-revokes in 4 h max. - All actions during a tunnel session are logged in the customer's audit log with our engineer as `actor_email`. ## 07 · Endpoint security Every engineering laptop: - Full-disk encryption (FileVault / LUKS). - YubiKey-based login + Vault unlock. - MDM (Mosyl for macOS; Fleet for Linux). - Automatic OS updates within 7 days. - No local admin on customer-shared projects without a ticket. - Signed Git commits (`git config commit.gpgsign true`). Lost-laptop runbook: `ops/sops/lost-laptop.md`. Wipe remotely within 15 min. ## 08 · Identity & access - Everyone has a YubiKey (two, one backup). - Every service: SSO via Google Workspace where possible; direct YubiKey otherwise. - No shared accounts. Ever. If a service doesn't support SSO and also doesn't support per-person accounts (rare), document it and revisit. - Google Workspace audit logs reviewed monthly. ## 09 · Vulnerability disclosure Public policy: `/.well-known/security.txt` on `bilbs.ai`. Points at `security@groupebilbs.com`. Response SLA: - Acknowledge report < 24 h. - Triage < 72 h. - Fix critical in < 7 d, high in < 30 d. - Publish advisory post-fix. Bug-bounty: not today (single-operator). We pay at our discretion for thoughtful reports; rare but precedented. ## 10 · Data handling - Customer corpus data is **processed, not stored** during an engagement. Outputs (weights, evals) are artefacts; the inputs are referenced, not copied. - When inputs must be cached locally (e.g., tokeniser sees them), the cache is on the customer's bastion or an ephemeral cloud instance that is destroyed after the run. - Deletion obligations in the DPA are honoured at D+30 absent a retention override. - We log data access at bastion level; customers get that log stream on request. ## 11 · Physical security - Hardware in transit between OEM → integrator → customer: - Sealed flight cases with tamper-evident stickers. - ShockWatch + TiltWatch indicators. - Insured freight with tracking. - Integrator floor: alarmed, 24/7 video, limited personnel. - Lab box: behind a locked door, on an isolated VLAN, no physical access for any customer without a handler present. ## 12 · Incident: security breach If we believe a breach has occurred: 1. **Contain** — revoke tokens, rotate keys, isolate host. 2. **Preserve** — take disk + memory snapshots before remediation. 3. **Notify** — founder + affected customers within 72 h of reasonable certainty. We do not sit on news. 4. **Report** — regulator notifications where legally required (Québec's CAI, GDPR supervisory, state AGs, etc.). 5. **Remediate** — patch, re-issue, re-sign. 6. **Disclose** — public advisory after remediation with specifics. 7. **Postmortem** — blameless, public where appropriate. Legal counsel on retainer for incident-response. Contact in 1Password. ## 13 · Things you will be tempted to do that you must not - Storing a customer's data on your laptop "just for a minute." - Using `sudo` on a customer's Box outside a WireGuard tunnel session. - Hard-coding a secret in a config file "temporarily." - Committing a `.env` to a private repo because "it's only us." - Skipping cosign verification on an artefact because "CI already checked." - Sharing a YubiKey with anyone for any reason. - Running a production migration without the two-person approval. - Using a personal AI tool (like an IDE assistant that calls out to a cloud model) on a file containing customer data. Each of these has ended in an incident somewhere in the industry. We are not above the list. ## 14 · Compliance summary (for our own reference) - **SOC 2 Type I**: attested. Type II: targeted Q4 2026. - **ISO 27001**: not today; revisit at team of 3+. - **GDPR / UK GDPR / Law 25**: yes, DPA template ready. - **HIPAA BAA**: available as add-on; not default. - **FedRAMP**: no. - **PCI-DSS**: not as a processor; Cluster supports PCI-adjacent data paths if configured. - **FIPS 140-3**: supported on Cluster on request (validated cryptographic modules). Compliance calendar and renewal cadence in `ops/compliance/calendar.md`. --- Next: [11-engagement-playbook.md](./11-engagement-playbook.md).