The Box.

The Bilbs platform runs on infrastructure your firm owns, on your own hardware on-premise or in your own AWS Bedrock account.

[ Hardware by firm size ]

Run it locally. Recommended hardware by firm size.

Prefer to run on-premise? Here’s the local hardware we’d recommend by firm size, sourced for you at cost and never marked up. Prefer no hardware at all? The same platform runs in your own AWS Bedrock account.

Foundation5–10 lawyers

1× NVIDIA RTX 5090
Ryzen 9 · 128 GB ECC

Tower

Practice10–30 lawyers

2× NVIDIA RTX 5090
Threadripper PRO · 256 GB ECC

Water-cooled desktop

Firm30–100 lawyers

2× RTX 6000 Ada / A100
Dual EPYC · 512 GB ECC · H200 option

4U rackmount

National100+ lawyers

8× A100 SXM (640 GB HBM)
H200 / B200 upgrade path

Half-rack cluster

from $99 / lawyer / month · one line on the invoice Recommended hardware below · billed separately, yours or sourced at cost

30 Lawyers, comfortably

9 Weeks from call to live

[ Who it fits ]

Built for 10 to 30 lawyer firms.

Mid-size Québec firms

10 to 30 lawyers, often across two practice groups. A major client just sent the firm an AI questionnaire. The managing partner already suspects associates are using ChatGPT off the corporate network, and the partnership wants the answer to be the firm’s, not Microsoft’s.

Boutiques with major-bank clients

Smaller firms whose clients are major banks, pension funds, or insurers, clients who treat your firm like a 500-lawyer firm when it comes to confidentiality. Their AI questionnaires arrive the same week the partnership realises associates have been using ChatGPT.

Notarial offices & in-house teams

A notarial office or a mid-size in-house legal department lives under the same Loi 25 rules and the same professional-secrecy obligations as a law firm. The Practice tier handles them with the same server, the same training on their files, and the same data-stays-yours promise.

Firms growing out of Foundation

Firms that started on the smaller Foundation server (5 to 10 lawyers) and have grown past it. We recommend the larger hardware spec, move your platform and trained model onto it over a weekend, and the lawyers don’t lose a day of work.

[ Recommended hardware · local deployment ]

For the firm’s IT director.

These are the hardware specs we recommend for an on-premise (local) deployment at this firm size. Use hardware you already own, or we source it at transparent cost, never marked up. Prefer no hardware at all? Run the platform in your own AWS Bedrock account.

Form factor Desktop/tower with rack-ear kit

GPU 2x NVIDIA RTX 5090
64 GB GDDR7 combined · PCIe Gen 5

CPU AMD Threadripper PRO
24-32 cores (up to 96 available)

RAM 256 GB DDR5 ECC

Cooling EKWB custom water loop
CPU, VRM, and GPU block

Primary storage RAID NVMe SSD · 8 TB usable

Model storage 4x 4 TB enterprise SSD (RAID 10)
8 TB usable

Networking 2x 25 GbE SFP28 + 2x 10 GbE SFP+

Power 2x 1600W 80-PLUS Titanium
1+1 redundant, hot-swap

Power & Environment

Idle power 320W

Inference load 550-780W

Peak / fine-tune 1,050W

Acoustic <42 dBA

Quieter than a normal server because it’s water-cooled.

Dual-corded: 208-240V / 20A on each feed. Recommended UPS: 2000 VA on each feed for 15 min runtime.

[ Model performance ]

Fast enough to feel like talking to a colleague.

Gemma 4 27B (FP16)

~58 tok/s

Generation · default serving model

Gemma 4 27B (INT4)

~95 tok/s

Quantised for throughput

Gemma 4 70B (INT4)

~32 tok/s

Generation · high-stakes drafting

Gemma 4 12B (FP16)

~135 tok/s

Generation · fast intake / classification

Gemma 4 Embedding

~25k tok/s

Retrieval over the DMS

Gemma 4 LoRA (firm)

~58 tok/s

Fine-tuned on the firm’s precedents

Simultaneous hosting

With 64 GB combined VRAM across the 2x RTX 5090, host Gemma 4 27B as the serving model + Gemma 4 Embedding + a Gemma 4 4B side agent concurrently with comfortable KV-cache headroom using vLLM’s dynamic partitioning.

[ Concurrency envelope ]

25 lawyers using it at the same second.

Ask the firm’s files a question 25 users

Draft memos and opinions 15–20 users

Review and redline a contract 20 users

Transcribe a meeting or a dictation 8 streams

Run discovery overnight on a big batch of files Unlimited

The server can handle 25 lawyers using it at the exact same second. In a 30-lawyer firm, that’s every associate drafting, every partner reviewing, and the assistants transcribing, all at once, without anyone waiting in line.

[ High availability ]

A second server, in case the first one ever goes down.

Paired configuration

2x Box on adjacent RU, dual-corded to independent A+B PDUs

Cross-connected 25 GbE for weight-sync + health

Shared VIP with keepalived + shared eval weights

Snapshot cadence: 6h to primary, 12h to secondary

Deterministic failover under 30 seconds

HA pair

No platform surcharge

A second server is infrastructure, yours or sourced at OEM cost, no markup. Scoped in the consultation.

For firms where the AI ends up in front of clients and a few hours of downtime isn’t acceptable. Two servers running side by side; if one ever fails, the second takes over in under 30 seconds, nobody loses a question. The per-lawyer line stays the same: $99 covers both servers, the only added cost is the second machine.

[ Pricing ]

One line. From $99 per lawyer.

Bilbs subscription · per lawyer, per month

from $99

/ lawyer / month · in Canadian dollars · goes up only if the firm adds optional services

Free audit · we scope the firm before you sign anythingIncluded

Deployment, indexing, on-site trainingIncluded

Updates, support, audit log, 24/7 pagerIncluded

Hardware (this Practice spec)Yours, or we source it

Transparent pricing

The server itself: water-cooled, two NVIDIA RTX 5090 graphics cards, AMD Threadripper PRO processor, 256 GB of error-correcting memory

Nine weeks from the first call to the first lawyer using it

Trained on every contract, memo, and matter your firm has handled

Plugged into your document system, your sign-on, your practice management software

Three-year hardware warranty, a printed guidebook, two training sessions per practice group (French or English)

Hardware path A · you have one

Use your server

If the firm already runs GPU-capable hardware (or has approved budget through your usual IT channel), we deploy onto it. The asset stays on the firm’s books. Nothing extra on the Bilbs line. The Practice spec on this page is a reference for what works comfortably for 10–30 lawyers.

Hardware path B · you don’t

We source it

No IT director, no server room yet? After the audit we source the Practice-spec server, install it on-site, configure the network, and own the runbook. Hardware sits on a separate, transparent quote at OEM list price, your call once you see the spec. We don’t mark up the GPU.

If the firm grows

No penalty

If the firm grows past 30 lawyers, we swap the chassis for the Firm spec and re-train on the new hardware over a weekend. No re-platforming charge. The per-lawyer line stays the same.

[ Timeline ]

From the first call to the first lawyer using it: nine weeks.

Step 1

Free 45-min call

Week 1–2

Step 2

It learns your firm

Week 3–5

Step 3

Server arrives

Week 6

(in parallel)

Step 4

We install it

Week 7

Step 5

Pilot group, then everyone

Week 8–9

[ When to upgrade ]

How you’ll know it’s time for the next tier.

Firm grew past 30 lawyers?

Move to the National tier

Keep your firm’s files in your firm.

A 45-minute call within one business day. We’ll tell you whether the Practice tier is the right size for your firm, and if it isn’t, we’ll tell you which one is. No pitch. No deck. No payment today.

Meet the person who installs it