Bilbs AI
[ Firm · full-service tier ] 30–100 lawyers

The Firm.

The bigger server. A proper rack-mount with two power supplies, two network cables, and hot-swap drives — built so a single failure doesn’t knock anyone offline. It runs the biggest model at full precision for 30 to 100 lawyers across one or more offices. Built for full-service Québec firms and the handful of bay-street-adjacent firms in the province. Nothing leaves the building.

See the other tiers
1U2U3U4U BILBS PRO · 4U GPU 1 GPU 2 NVME FIRM · 2× RTX 6000 ADA / A100 · 4U
from $29 / lawyer / month · one line on the invoice Hardware below · audited separately, yours or sourced by us
200 Concurrent users
9 Weeks from call to live
[ Who the Firm tier fits ]

For firms where AI is infrastructure, not a trial.

Full-service Quebec firms

30 to 100 lawyers across M&A, litigation, real estate, tax, and labour. The kind of firm that sees questionnaires from banks, insurers, and pension funds every week — and where the partnership wants the same answer on every one.

Multi-office practices

Montréal plus Québec City, Ottawa, or Toronto - one Firm-tier Box serves every associate over the corporate VPN. Per-office latency under 50 ms within province.

Firms with SLA clients

Where AI downtime affects a litigation deadline or a closing. Deterministic failover under 30 seconds. Optional HA pair for single-rack redundancy.

Firms growing out of Practice

Firms that started on the smaller Practice server and have grown past 30 lawyers. We swap the server for the price difference, re-train on the new hardware over a weekend, and the lawyers don’t lose a day of work.

[ Hardware specifications ]

Built on a dual EPYC 4U rackmount.

Form factor 4U rackmount (rack-native)
GPU 2× NVIDIA RTX 6000 Ada / A100
96–160 GB HBM combined · PCIe Gen 4/5
CPU Dual AMD EPYC 9354
64 cores / 128 threads total
RAM 512 GB DDR5-4800 ECC
Primary storage 4x 3.84 TB NVMe Gen 5 (RAID 10)
8 TB usable
Cold storage 8x 7.68 TB SATA SSD (RAID 10)
32 TB usable
Networking 2x 100 GbE QSFP28 + 2x 25 GbE SFP28
Power 2+2x 2600W 80-PLUS Titanium
Redundant, hot-swap
Cold spares included 1x PSU, 1x fan tray, 1x NVMe

Power & Environment

Idle power 520W
Inference load 1,100-1,600W
Peak / dual-GPU fine-tune 2,100W
Acoustic <75 dBA

Needs a proper rack room or sound-isolated closet

2x C19 (30A) on dedicated 208-240V circuits. Recommended UPS: 3500 VA on each feed for 10 min runtime.

[ Model performance ]

Run Gemma 4 70B FP16 at full precision.

Gemma 4 70B (FP16)

~48 tok/s
Tensor-parallel · default serving model · 200 concurrent

Gemma 4 70B (FP8)

~62 tok/s
250 concurrent · quantised for throughput

Gemma 4 27B (FP16)

~105 tok/s
Side serving model · lower-stakes drafting

Gemma 4 LoRA (firm)

~48 tok/s
Fine-tuned on the firm’s precedents

Simultaneous: 70B + 12B + embed

200+80
Chat users + agent users

Multi-model hosting

With up to 160 GB of GPU memory across two A100s and PCIe Gen 5 interconnect, host Gemma 4 70B FP16 + a Gemma 4 12B agent + Gemma 4 Embedding concurrently with efficient tensor parallelism. H200 NVL upgrade lifts that to 282 GB HBM3e.

[ Concurrency envelope ]

Scale to 200 simultaneous users.

Chat (Gemma 4 70B INT4, ~1k-token prompts) 200 users
Ask the firm’s files a question 160 users
Multi-step agents (6-10 tool-use) 120 users
Streaming transcription 60 streams
A mix of all of the above 80+40+40
[ High availability ]

Designed for active/passive HA pairs.

HA pair configuration

2x Pro in adjacent RUs, cross-connected via 100 GbE
Shared virtual IP managed by keepalived / VRRP
Weight sync over dedicated 25 GbE link, replicated every 30s
Snapshot: 1h rolling, 6h pinned, 24h offsite
Sub-30-second automatic failover, zero prompt loss
HA pair add-on
+ second server
Second Firm-spec server on standby · hardware quoted at OEM cost, same per-lawyer line

Recommended for production workloads. In-flight request queue replication means no lost prompts during failover. The $29 / lawyer / month subscription covers both servers — only the hardware is doubled.

[ Decision criteria ]

Pick Pro if any two are true.

The firm has 30 or more lawyers (or is on its way there)

The work has hard deadlines you cannot miss — litigation calendars, closings, regulator filings

Different practice groups need their own model, isolated from each other

A regulator or a client contract requires a second server that takes over automatically if the first one fails

You want the biggest model at full precision (the more accurate version)

Otherwise: the Practice tier may fit the firm better

[ Pricing ]

One line. From $29 per lawyer.

Bilbs subscription · per lawyer, per month
from $29
/ lawyer / month · in Canadian dollars · goes up only if the firm adds optional services
Free audit · we scope the firm before you sign anythingIncluded
Deployment, indexing, on-site training (per practice group)Included
Updates, support, audit log, 24/7 pager, OEM RMA shepherdingIncluded
Hardware (this Firm spec)Yours, or we source it
Full-service spec
4U rackmount with NVIDIA RTX 6000 Ada / A100, enterprise redundancy (dual PSU, hot-swap NVMe)
6-week deployment (fine-tune, indexing, on-site training)
Trained on every contract, memo, and matter your firm has handled
Cold spares: 1x PSU, 1x fan tray, 1x NVMe
One on-site install day (NA/UK/EU) included
Hardware path A · you have one

Use your server

Most full-service firms in this band have a server room and an IT director. If the firm already runs GPU-capable enterprise hardware (or has approved capex for it), we deploy onto it. The Firm spec on this page is the reference: 2× RTX 6000 Ada / A100, dual EPYC, 512 GB ECC, 4U rack with enterprise redundancy.

Hardware path B · you don’t

We source it

If the partnership wants to add private AI without the IT scope expanding, we source the Firm-spec server, install it on-site, configure the network, and own the runbook. Hardware sits on a separate, transparent quote at OEM list — your call once you see the spec. We don’t mark up the GPU.

If the firm grows

No penalty

If the firm grows past 100 lawyers, we swap the chassis for the National cluster and re-train on the new hardware. No re-platforming charge. The per-lawyer line stays the same.

[ Timeline ]

Contract to live in ~8 weeks.

Step 1

Contract + deposit

Same day
Step 2

Deployment

6 weeks
Step 3

Hardware

5-6 weeks
(H200 NVL lead times)
Step 4

White-glove freight

5-10 days
Step 5

On-site rack + bring-up

4-6 hours
Included

On-site install day

NA / UK / EU at no extra cost

AI infrastructure for a full-service firm.

A 45-minute call within one business day. We’ll tell you whether the Firm tier is the right size for your firm — and if it isn’t, we’ll tell you which one is. No pitch. No deck. No payment today.