Bilbs AI
Most firms pick this one [ Practice · default tier ] 10–30 lawyers

The Practice.

The server we install for most of our firms. It sits quietly in your own server room — about the size of a small filing cabinet — and serves 10 to 30 lawyers across two practice groups. We train it on every contract, memo, and matter your firm has handled. Your lawyers ask it instead of pasting client files into ChatGPT. Nothing leaves the building. Ever.

See the other tiers
BILBS BOX BOX · WATER-COOLED
from $29 / lawyer / month · one line on the invoice Hardware below · audited separately, yours or sourced by us
30 Lawyers, comfortably
9 Weeks from call to live
[ Who the Practice fits ]

Built for 10 to 30 lawyer firms.

Mid-size Québec firms

10 to 30 lawyers, often across two practice groups. A major client just sent the firm an AI questionnaire. The managing partner already suspects associates are using ChatGPT off the corporate network — and the partnership wants the answer to be the firm’s, not Microsoft’s.

Boutiques with major-bank clients

Smaller firms whose clients are major banks, pension funds, or insurers — clients who treat your firm like a 500-lawyer firm when it comes to confidentiality. Their AI questionnaires arrive the same week the partnership realises associates have been using ChatGPT.

Notarial offices & in-house teams

A notarial office or a mid-size in-house legal department lives under the same Loi 25 rules and the same professional-secrecy obligations as a law firm. The Practice tier handles them with the same server, the same training on their files, and the same nothing-leaves-the-building promise.

Firms growing out of Foundation

Firms that started on the smaller Foundation server (5 to 10 lawyers) and have grown past it. We swap the server for the price difference, re-train on the new hardware over a weekend, and the lawyers don’t lose a day of work.

[ Hardware specifications ]

For the firm’s IT director.

Form factor Desktop/tower with rack-ear kit
GPU 2x NVIDIA RTX 5090
64 GB GDDR7 combined · PCIe Gen 5
CPU AMD Threadripper PRO
24-32 cores (up to 96 available)
RAM 256 GB DDR5 ECC
Cooling EKWB custom water loop
CPU, VRM, and GPU block
Primary storage RAID NVMe SSD · 8 TB usable
Model storage 4x 4 TB enterprise SSD (RAID 10)
8 TB usable
Networking 2x 25 GbE SFP28 + 2x 10 GbE SFP+
Power 2x 1600W 80-PLUS Titanium
1+1 redundant, hot-swap

Power & Environment

Idle power 320W
Inference load 550-780W
Peak / fine-tune 1,050W
Acoustic <42 dBA

Quieter than a normal server because it’s water-cooled.

Dual-corded: 208-240V / 20A on each feed. Recommended UPS: 2000 VA on each feed for 15 min runtime.

[ Model performance ]

Fast enough to feel like talking to a colleague.

Gemma 4 27B (FP16)

~58 tok/s
Generation · default serving model

Gemma 4 27B (INT4)

~95 tok/s
Quantised for throughput

Gemma 4 70B (INT4)

~32 tok/s
Generation · high-stakes drafting

Gemma 4 12B (FP16)

~135 tok/s
Generation · fast intake / classification

Gemma 4 Embedding

~25k tok/s
Retrieval over the DMS

Gemma 4 LoRA (firm)

~58 tok/s
Fine-tuned on the firm’s precedents

Simultaneous hosting

With 64 GB combined VRAM across the 2x RTX 5090, host Gemma 4 27B as the serving model + Gemma 4 Embedding + a Gemma 4 4B side agent concurrently with comfortable KV-cache headroom using vLLM’s dynamic partitioning.

[ Concurrency envelope ]

25 lawyers using it at the same second.

Ask the firm’s files a question 25 users
Draft memos and opinions 15–20 users
Review and redline a contract 20 users
Transcribe a meeting or a dictation 8 streams
Run discovery overnight on a big batch of files Unlimited

The server can handle 25 lawyers using it at the exact same second. In a 30-lawyer firm, that’s every associate drafting, every partner reviewing, and the assistants transcribing — all at once, without anyone waiting in line.

[ High availability ]

A second server, in case the first one ever goes down.

Paired configuration

2x Box on adjacent RU, dual-corded to independent A+B PDUs
Cross-connected 25 GbE for weight-sync + health
Shared VIP with keepalived + shared eval weights
Snapshot cadence: 6h to primary, 12h to secondary
Deterministic failover under 30 seconds
HA pair add-on
+$429 / mo
Second server, 48-month lease · up to $849 / mo on a 24-month lease

For firms where the AI ends up in front of clients and a few hours of downtime isn’t acceptable. Two servers running side by side; if one ever fails, the second takes over in under 30 seconds — nobody loses a question. The per-lawyer line stays the same: $29 covers both servers.

[ Pricing ]

One line. From $29 per lawyer.

Bilbs subscription · per lawyer, per month
from $29
/ lawyer / month · in Canadian dollars · goes up only if the firm adds optional services
Free audit · we scope the firm before you sign anythingIncluded
Deployment, indexing, on-site trainingIncluded
Updates, support, audit log, 24/7 pagerIncluded
Hardware (this Practice spec)Yours, or we source it
Below Copilot
The server itself: water-cooled, two NVIDIA RTX 5090 graphics cards, AMD Threadripper PRO processor, 256 GB of error-correcting memory
Nine weeks from the first call to the first lawyer using it
Trained on every contract, memo, and matter your firm has handled
Plugged into your document system, your sign-on, your practice management software
Three-year hardware warranty, a printed guidebook, two training sessions per practice group (French or English)
Hardware path A · you have one

Use your server

If the firm already runs GPU-capable hardware (or has approved budget through your usual IT channel), we deploy onto it. The asset stays on the firm’s books. Nothing extra on the Bilbs line. The Practice spec on this page is a reference for what works comfortably for 10–30 lawyers.

Hardware path B · you don’t

We source it

No IT director, no server room yet? After the audit we source the Practice-spec server, install it on-site, configure the network, and own the runbook. Hardware sits on a separate, transparent quote at OEM list price — your call once you see the spec. We don’t mark up the GPU.

If the firm grows

No penalty

If the firm grows past 30 lawyers, we swap the chassis for the Firm spec and re-train on the new hardware over a weekend. No re-platforming charge. The per-lawyer line stays the same.

[ Timeline ]

From the first call to the first lawyer using it: nine weeks.

Step 1

Free 45-min call

Week 1–2
Step 2

It learns your firm

Week 3–5
Step 3

Server arrives

Week 6
(in parallel)
Step 4

We install it

Week 7
Step 5

Pilot group, then everyone

Week 8–9
[ When to upgrade ]

How you’ll know it’s time for the next tier.

Keep your firm’s files in your firm.

A 45-minute call within one business day. We’ll tell you whether the Practice tier is the right size for your firm — and if it isn’t, we’ll tell you which one is. No pitch. No deck. No payment today.