Bilbs AI
Custom configuration [ National · enterprise scale ] 100+ lawyers · multi-office

The National.

A half-rack cluster of eight enterprise NVIDIA GPUs. Built for national Canadian firms across several offices and several provinces, large in-house legal departments, and major financial-institution counsel teams. Each one is custom-built for the firm — but the rule stays the same: nothing leaves the building.

See the other tiers
A100 SXM A100 SXM A100 SXM A100 SXM BILBS CLUSTER A100 SXM STORAGE / NETWORK 1U3U5U7U...19U21U
from $29 / lawyer / month · volume rates at 250+ & 500+ seats Hardware below · audited separately, your call after
100+ Lawyers · multi-office
8x A100 / H200 / B200 GPUs
[ Who the National tier fits ]

For Canadian national firms and large in-house teams.

National law firms

100 to 1,500 lawyers across Montréal, Toronto, Vancouver, Calgary, and Ottawa — firms whose privileged material cannot leave Canadian jurisdiction, full stop.

In-house legal at national banks & insurers

Legal departments inside Canadian banks and insurers, where OSFI, the AMF, and Loi 25 already define what can leave the building. The cluster lives inside the institution itself.

Pension funds & institutional investors

CDPQ, OMERS, CPP Investments and the like — in-house legal, compliance, and deal teams handling material non-public information that cannot cross a border. The cluster keeps it inside.

Crown corporations & public sector

Federal and provincial departments with Cabinet-confidence obligations and CSE / ITSG-33 rules on how sensitive information is handled. The cluster meets them by design.

Big Four accounting, regulated

The Canadian practices of the Big Four — tax, audit, advisory — deployed on the same cluster, with each practice fully isolated from the others.

Healthcare & university counsel

Large health networks and Canadian universities running AI over patient records, research IP, and institutional files — never leaving Canadian jurisdiction.

[ Base hardware configuration ]

Built on an NVIDIA HGX B200 platform.

The Cluster is configure-to-order. This is the base spec - most clients deviate on GPU mix, rack density, and power shape. We quote against a written RFP.

Form factor Half-rack (22U) or sealed 42U
GPU (base) 8x NVIDIA A100 SXM
80 GB HBM2e each (640 GB total)
GPU alt 8x H200 SXM5 (+) or 8x B200 SXM (++)
141–192 GB HBM3e each
GPU interconnect NVLink 4.0 · 900 GB/s per GPU
CPU 2x AMD EPYC 9554
64 cores each · 128 cores total
RAM 1.5 TB DDR5-4800 ECC
Tier 0 storage 8x 7.68 TB NVMe Gen 5 (RAID 10)
32 TB usable
Networking 4x 400 GbE QSFP-DD
2x 100 GbE QSFP28
InfiniBand NDR optional
Cooling Rear-door HX compatible
Direct-liquid on H200

Power & Environment

Idle power 1,800W
Inference load 4,500-6,500W
Peak / training 9,000W

Sustained

Acoustic ~85 dBA

Datacenter placement only

Datacenter requirements

3-phase 208V / 60A on each A+B feed. ASHRAE Class A1 cooling. We deploy into Equinix, CoreSite, Digital Realty, or your private DC.

[ Model performance ]

Run Gemma 4 70B at scale. Train while serving.

The Cluster is the only tier that can train (continued fine-tuning on the firm’s new matters) while serving production traffic, without noticeable impact on user-facing latency.

Gemma 4 70B (FP16)

~95 tok/s
8-way tensor parallel · 1,200 concurrent

Gemma 4 70B (FP8)

~130 tok/s
1,500 concurrent · quantised for throughput

Gemma 4 27B (FP16)

~210 tok/s
Lower-stakes drafting · side serving model

Gemma 4 LoRA (firm)

~95 tok/s
Fine-tuned on the firm’s precedents

Continued fine-tune (12B)

~8 hrs
Per epoch on 1B-token corpus, no service impact

Gemma 4 Embedding

~120k tok/s
Retrieval across the entire DMS
[ Concurrency envelope ]

Scale to 1,200+ simultaneous users.

Chat (Gemma 4 70B FP16, tensor-parallel) 1,200+ users
Ask the institution’s files a question 1,000 users
Complex multi-step agents 400 users
Mixed workload 800+ users
Training + serving (simultaneous) 1 job + 600
[ Multi-tenancy ]

Hard multi-tenancy via three layers.

Layer 1

Namespace isolation

Kubernetes-layer isolation. Each tenant gets a dedicated namespace, RBAC boundaries, and network policies.

Layer 2

SR-IOV

Network-layer isolation. Each tenant sees a dedicated Virtual Function with guaranteed bandwidth.

Layer 3

MIG (Multi-Instance GPU)

GPU-layer isolation on B200/H200. Each tenant gets 1/7 of an SXM minimum, up to whole-GPU allocations.

Common multi-tenant layouts

7 hospitals sharing one Cluster with guaranteed 1x MIG slice each
12 business units in an enterprise with dynamic allocation + fair-share
A carrier offering "AI as a network service" to managed customers
[ Compliance & assurance ]

Built for regulated environments.

FIPS-140-3
Level 2 validated
Common Criteria
EAL4+ on request
PCI-DSS
Level 1 isolation
HIPAA
+ HITRUST r2
ITAR / CUI
US-manufactured
SC 2030
Canadian Protected-B

Individual SOC 2 Type II

A SOC 2 report scoped specifically to your Cluster instance is available as an add-on ($9,999 per report).

[ Pricing ]

One line. From $29 per lawyer.

Bilbs subscription · per lawyer, per month · volume rates
from $29
$34 at 250+ seats · $29 at 500+ seats · one line on the invoice
What the subscription includes
  • Multi-office audit and infrastructure review
  • Cluster deployment, indexing, on-site formation per practice group (FR / EN)
  • Fine-tune, eval harness, integrations
  • Updates, model refresh, security patches, monitoring, encrypted backups
  • 24/7 Sev-1 pager, audit logging, OEM RMA shepherding
  • Cancellable on 30 days’ notice
Legal

National-tier customers typically redline our standard MSA and DPA with their own counsel. We plan for 2–4 weeks of legal review.

Hardware · audited separately

Two paths, same line

After the audit, the firm chooses: deploy onto cluster hardware you already operate (most large customers have rack space and IT in place), or have us source and install the National-spec cluster on a separate transparent quote at OEM list price. The National spec on this page is the reference: 8× A100 base, with H200 / B200 swap options.

Premium services (optional)

Add-ons, à la carte

  • Dedicated SRE · named, on call+$2,500 / mo
  • Second cluster on standby (HA)+quote
  • SOC 2 Type II scoped to your instance$9,999 / yr
  • Custom fine-tune cycles (premium)+quote
Hardware swap

A100 · H200 · B200

The National reference is an 8× A100 cluster. H200 or B200 swap is available if the firm has the rack power and budget — separate OEM quote, your call.

[ Timeline ]

Contract to production in 18-24 weeks.

Step 1

RFP response

72 hours
Step 2

Commercial close

2-6 weeks
+ 33% deposit
Step 3

Hardware procurement

10-14 weeks
(B200 / H200 SXM lead times)
Step 4

Deployment

8-16 weeks
(parallel)
Step 5

DC ship + white-glove install

2 weeks
Step 6

UAT + handover

2 weeks

Included: dedicated deployment engineer

On-site for the install week and the first week of UAT. Covers NA, UK, EU, and select APAC locations.

AI under Canadian jurisdiction.

We respond to RFPs within 72 hours. We start with a free 45-minute call with the managing partner or the general counsel. From there, we build a custom plan for IT and procurement.

Request RFP response