System Architecture · Technical Review

True Build OS

The operating system behind the interior design studio.

A walkthrough of the ingest pipeline, the AI extraction and orchestration layer, the decision systems, and the portal architecture — from first field input to final logged decision.

Prepared forTechnical Architecture Review

ScopeIngest · Orchestration · Decisions · Surfaces

StackRails 7 · PostgreSQL + pgvector · Claude · Hotwire

Live attruebuild.ae

What this covers

A working studio, run as software

True Build OS runs a working interior design studio end-to-end: how information comes in from the field, how it becomes structured records, how work gets routed, and how decisions get made and logged.

Everything here is built and in production — every claim maps to real code, not roadmap.
It follows one path: ingest → orchestration → decisions → surfaces, then the wider ecosystem it plugs into.
I'd genuinely value your read on it — especially different ways to think about these problems, and where it could go next.

In production

6

distinct portals / surfaces, multiple identity models

AI as plumbing

~15

background jobs call Claude for extraction, classification, suggestion

Ingest channels

6

WhatsApp · Telegram · email · voice · web/QR · contracts

Knowledge index

7

entity types embedded into pgvector for semantic + temporal recall

System at a glance

One monolith, five layers, many mouths to feed

01 · Ingest channels

WhatsApp · Telegram · Email · Voice · Web/QR · Contracts

02 · AI normalization

Claude structures every input

DocumentGeneration · classifiers · extractors · OCR

03 · Core domain — Project is the root

Records · Issues · Action items · Approvals · Decisions · FF&E · Budget

All on one append-only event spine

04a · Orchestration

Next-Best-Action · Ball-in-Court

04b · Knowledge

pgvector RAG · bi-temporal

05 · Surfaces

Backoffice · Client · Vendor · Consultant · Accountant

+ Anthropic · OpenAI · Zoho · Postmark · Forge/OpenClaw

Rails monolith · token-is-auth public surfaces (approvals, NCRs, site capture) sit outside the five authenticated portals. The whole thing is one deployable.

The three load-bearing ideas

What the architecture actually commits to

01 · Aggregate

Project is the root

Every record, issue, task, decision, cost and approval hangs off one Project. There is a single place to ask "what is the state of this job?" — no cross-system reconciliation.

02 · Event-driven

State change is a fact

Mutations emit verbs onto an append-only event spine. The spine is the trigger surface: suggestions, notifications and embeddings all react to events rather than being wired into business logic.

03 · AI as plumbing

Inference is infrastructure

Claude is not a chat feature bolted on the side. It sits inside ~15 background jobs doing extraction, classification and routing — with structured tool-use, logging and graceful fallback.

The bet: in a studio, work is information-driven (a transcript becomes tasks) rather than process-driven (step 1 must finish before step 2). So the system is an event-and-extraction engine, deliberately not a BPMN workflow tool.

PART ONE

The ingest pipeline

How messy, real-world input from a building site becomes clean, structured, queryable records — usually without anyone filling in a form.

Ingest · overview

Six channels in, one structured spine out

bind once

A chat is tied to a project via /init MZK001 — then every message routes itself.

async

Webhooks verify and return fast; heavy work runs in retried Solid Queue jobs.

provenance

Every record keeps its input_source and a link back to the exact message.

Voice intake is moving onto the Plaud MCP. Claude does the structuring; OpenAI embeddings do the indexing — split by job.

Ingest · the non-obvious part

Multimodal intake with a "quiet window"

in

Photo + voice note + caption

arrive seconds apart, out of order

collect

Intake session — a "quiet window"

artifacts accumulate; the timer resets on each new one

classify

One multimodal Claude call

image(s) + text → 4 buckets + confidence

confirm

"Reply Yes to log"

→ a typed record: Issue · Action · Decision

The WhatsApp vendor-onboarding path uses the same spirit in reverse: an LLM-driven state machine (greeting → company → trades → certs → contact → confirm) that accumulates a payload across turns and emits a VendorRegistration on completion.

Ingest → structure

One tool-use call that does a day's admin

DocumentGenerationJob takes an unstructured transcript or email and returns a single structured object that the app commits in one transaction.

Classifies into one of 11 record types (minutes, RFI, PCN, site report, spec, defect…).
Emits a formatted document plus the side-effects: new action items, updates to existing items by ID, and closures with a resolution.
Also extracts structured decisions, open questions and change requests — each with a source quote for traceability.
It learns the studio's house style — when staff approve or edit a generated document, that final version is saved. Next time the same record type is generated for that project, a few of those approved documents are pasted into the prompt as worked examples, so the output matches how this studio actually writes. No fine-tuning, no retraining.
Advisory-lock serialized per project so two concurrent transcripts can't both mutate the same action item.

Tool-use schema · abridged

{
  record_type:        "site_visit_report",
  summary:            "…",
  decisions_structured:  [ {statement, decided_by,
                           source_quote} ],
  open_questions:        [ {question, raised_by} ],
  new_action_items:      [ {description, assignee,
                           due_date, task_type} ],
  action_item_updates:   [ {id, status, next_step} ],
  action_item_closures:  [ {id, resolution} ],
  change_requests:       [ {description, cost_impact} ]
}

Falls back to raw-JSON parsing if tool-use is unavailable. Every call is logged to AiUsageLog with tokens + cost.

The non-obvious bit isn't extraction — it's that the model reconciles against existing state (update / close by ID) instead of always creating new rows. That's what keeps the action-item ledger continuous.

Draft refinement

Point at what's wrong

The hard part of AI editing isn't generating text — it's telling the model which part to change without it quietly rewriting everything else. The studio's answer borrows from how people already mark up documents.

Highlight any span in a generated record and leave a comment — "too formal", "wrong supplier", "merge these two". Like Google-Docs comments, except they regenerate.
A general-instruction box handles document-wide asks — "make the tone more formal, fix the dates".
Each comment is scoped to its section and its exact quoted text, so the model knows precisely where you mean — no guessing from a vague prompt.

Minutes · Site visit · 12 Mar · MZK001

Walkthrough of Level 3 with the client. The contractor confirmed the marble will arrive next week; tiling to follow once the substrate is signed off.

Client approved the brushed-brass ironmongery. Electrical position on the island to move 300 mm east before second fix.

"contractor confirmed the marble will arrive next week"

It's Stone & Co specifically, and the date is the 19th — fix both.

CancelAdd comment

On submit, Sonnet 4.6 receives every section at once with a forced apply_revisions tool — annotated sections must come back changed, not echoed.

Draft refinement

Get a redline back, not a rewrite

Word-level diff (LCS) renders the result as tracked changes — insertions in green, deletions struck through. You see exactly what moved.
Accept or reject each change, or all at once; the preview stays editable for a manual touch-up before it lands.
Consistency propagation — rename a supplier in one comment and it's corrected in every section, not just the line you marked. This is the part that's genuinely hard to get right.
Nothing is persisted — annotations are a transient instruction, not data. The approved version then feeds the house-style learning loop.

Summary · annotation

"contractor confirmed the marble will arrive next week"

It's Stone & Co specifically, and the date is the 19th.

3 CHANGES · 2 SECTIONS Reject all Accept all

The contractorStone & Co confirmed the marble will arrive next weekon the 19th; tiling to follow once the substrate is signed off.

↳ consistency: "contractor" → "Stone & Co" also updated in Decisions and Action Items

The whole loop is server-rendered: comment → background job → tracked-changes preview streamed back over Turbo → accept → committed. No SPA, no editor framework.

PART TWO

Orchestration & decisions

Once everything is a structured fact on a shared event spine, the system can route work, suggest the next move, and capture decisions with an audit trail.

Orchestration · routing

"Whose move is it?" as a first-class concept

Six different entity types — action items, issues, approvals, decisions, open questions, attention alerts — all share a Ball-in-Court concern: a polymorphic pointer to whoever currently owns the next move, plus a snooze.

A single query assembles one unified queue per person across all six types.
A shared urgency score ranks them: overdue +100, severity, staleness, age-of-pending-approval, due-soon.
The ball shifts explicitly (staff → vendor → client) and every shift is an event.

Most tools give each object type its own backlog. Here the operational question — what is waiting on me, ranked — is answered in one place regardless of object type.

six sources

Action items · Issues · Approvals · Decisions · Open questions · Alerts

one ranking

Unified urgency score

overdue +100 · severity · staleness · age

my queue

One ranked list of what's waiting on you

Orchestration · the agentic core

Next-Best-Action: confidence you can govern

trigger

Event — record generated, approval requested…

Claude (Haiku)

Suggests 1–3 templated actions, each scored 0–1

blend the score

0.8 · model + 0.2 · history

history = this studio's acceptance rate for that action

≥ 0.85

Auto-execute

below

Propose to staff

↑ accept / reject feeds back into the history

Every suggestion is persisted as a SuggestedAction — even auto-executed ones — so acceptance rates are measurable per template. Low-risk templates auto-fire; the model's confidence alone never gets a vote without the studio's track record beside it.

Decisions · the record of truth

Approvals: zero-login, snapshot-immutable

One polymorphic ClientApproval serves three different approvable things — an FF&E item, a variation order, or a design record. The client never logs in: the secure token is the authentication.

Snapshot at send time — the approvable's state is frozen into the approval. If the item later changes, the client's decision still references exactly what they saw.
One-shot, 72-hour token, consumed under a row lock — no double-submit, no replay.
Approval drives side-effects — an approved FF&E item advances its lifecycle stage; a record flips to approved — emitted as events.
Signed proof captured — IP, user-agent, timestamp on the decision.

approvable

FfeItem · VariationOrder · ProjectRecord — one model, three masters

72h

token life

1×

consume

0

logins

flow

draft → snapshot + token → email link → client approves / requests-changes → side-effect + event → confirmation back to requester (and Telegram, if that's where it began)

The same token-is-auth pattern powers NCR acknowledgements, public issue reports, and QR site-capture — external parties act without an account, scoped to exactly one object.

Knowledge layer

Recall across entities — and across time

Seven entity types embed their text into pgvector on write. "Ask Arqis" runs one semantic query spanning all of them, assembles a context window, and has Claude answer with the project's own history.

No external vector DB. Embeddings live in the same Postgres as the data — HNSW cosine index, OpenAI text-embedding-3-small.
Graceful fallback to recency if the embeddings provider is unavailable.
Bi-temporal queries — reconstruct "what were the live decisions as of that date", walking supersession chains. This is the part most RAG stacks don't have.

I'll be candid on the next slide about which of these is genuinely differentiated and which is just 2024 table-stakes.

embedded

Events · Records · Issues · Action Items · Decisions · Open Questions · Contract Chunks

retrieval

single query → 6 entity types → 6k-token context → Claude Haiku answer, logged

bi-temporal

decisions_at(t) · events_at(t) · current_superseding(d) — state as-of any moment, not just "now"

Note: this is the studio's own embedded RAG — distinct from "gbrain", which is a personal developer knowledge tool and is not part of the product.

How the system improves itself

It tunes its own prompts

1

Staff approves / edits a generated document

2 · PromptLearning

Stores the (raw → approved) pair

3 · PromptRefinementJob

Reads recent corrections, asks Claude for a better prompt

4 · PromptSuggestion

Human reviews the diff → updates the template

↑ which changes how the next document is generated

Human approvals are treated as labelled data. They do two jobs: few-shot exemplars at generation time, and fuel for an offline meta-loop that proposes prompt rewrites for a human to accept. Both stay human-gated by design.

Surfaces

Five portals, three ways to be "logged in"

All five are live and in daily use — every portal has at least one active user right now. They sit at different stages of continuous development, but none is a prototype.

/portal

Backoffice

Staff. Queue, projects, records, FF&E grid, library, comms.

/client

Client

Approvals, brief, project story, assistant chat.

/vendor

Vendor

Assigned issues, NCRs, photos, certs.

/consultant

Consultant

Scoped project access, items, expenses.

/accountant

Accountant

Expense triage workflow only.

staff

Sign in with a password. What each person can see and do is scoped to their role — a consultant or accountant can't reach beyond their own remit.

clients & vendors

No password ever. They get a one-time email link that expires in 15 minutes — nothing to set up, nothing to forget.

public

The link is the key. Client approvals, defect sign-offs and on-site QR photo capture need no login — each secure link opens exactly one thing.

One identity record per person, keyed on email, so the same individual can't end up as two separate logins. Staff can step into a client or vendor's view to help them directly.

The interface

Made for the desk and the site

truebuild.ae/portal/queue

True Build

Dashboard

My Queue

Projects

Records

Issues

Library

Vendors

Expenses

My Queue

12 items waiting on you

3Overdue

5Approvals

8Open issues

4Due today

NCR-014 — cracked cladding, Level 3MZK001Overdue

Approve: marble selection, master bathSIN-003Awaiting client

RFI-042 response from structuralMZK001Ready

Site-visit minutes — draft generatedVLA-009Review

Variation order — outlet relocationSIN-003Draft

9:41● ● ● ▮

My Queue

8 on your plate

✓ Donerelease to clear

MZK001Overdue

NCR-014 — cracked cladding, Level 3

SIN-003Client

Approve: marble selection, master bath

VLA-009Review

Site-visit minutes — draft generated

👆

Queue

Projects

Capture

Me

Representative of the live UI, rendered in the same design system. Mobile triage is swipe-first — clear, snooze, or reassign an item without leaving the queue.

The stack & the choices behind it

Boring infrastructure, on purpose

runtime

Rails 7.1 · Ruby 3.3
single monolith, one deploy

data

PostgreSQL 17 + pgvector
relational + vectors, one DB

async + realtime

Solid Queue · Solid Cable
Postgres-backed, no Redis

front end

Hotwire (Turbo + Stimulus)
server-rendered, no SPA

The throughline

Minimise moving parts. No Redis, no external vector service, no Node build step, no SPA. One Postgres is the queue, the cache-bus, the relational store and the vector index.
Data sovereignty. Sensitive client and contract data — including embeddings — never leaves the app's own database.
The cost: you trade horizontal-scale headroom and specialised tooling for operational simplicity. For a studio's data volumes, that's the right side of the trade — and an explicit one.

AI

Anthropic Claude — Haiku 4.5 (fast triage), Sonnet 4.6 (long-form generation)

embeddings

OpenAI text-embedding-3-small · 1536-dim · HNSW

comms

Postmark (in + out) · OpenClaw/wacli (WhatsApp) · Telegram · Zoho CRM sync

ops

Docker · Heroku (prod) + Coolify (staging) · Sentry · Backblaze B2

Where the effort went

Novelty spent only where it mattered

Genuinely non-obvious

Governed agentic execution — blending model confidence with the studio's own historical acceptance rate, with an auto-execute threshold and a measurable feedback loop.
Approvals-as-training-data — the same human approvals power both run-time few-shot and an offline prompt-refinement meta-loop.
Highlight-to-comment refinement — point at the exact span, say what's wrong; only that regenerates, with the rename propagated across every section.
Quiet-window multimodal intake — stitching a bursty voice+photo+text field report into one event and one classification, race-safe.
Unified ball-in-court queue — one ranked "whose move" list across six heterogeneous entity types.
Bi-temporal project memory — reconstructing live state as-of any past date over an event spine.
Snapshot-immutable, login-less approvals as the system of record for client decisions.

Deliberately standard

Vanilla RAG — embed → nearest-neighbour → stuff context → answer. No re-ranking, no hybrid search, no citation verification.
Receipt OCR via vision model — useful, entirely commodity.
Webhook + ActionMailbox ingestion — Rails built-ins; only the routing logic is ours.
Single-shot severity classification.
Direct API calls with retry — no exotic orchestration framework.
Templated prompts — string interpolation, not a prompt DSL.

Deciding which problems deserve a clever solution — and which deserve a boring, proven one — is most of the work.

The whole surface

Everything a studio runs on, in one system

10

module groups

6

intake channels

5

portals

~2 mo

to build

Intake

WhatsApp · Telegram
Email · Voice (Plaud)
Web / QR capture
Submissions

Project delivery

Projects · stages
Milestones
FF&E planning grid
Kickoff wizard · floor plans

Records & docs

Minutes · RFIs · PCNs
Site reports · specs
Drawings
AI document generation

Issues & quality

Snagging · defects
NCRs
Vendor sign-off
Photo evidence

Decisions

Client approvals
Decision ledger
Open questions
Change → variation orders

Knowledge

Ask Arqis (semantic)
Bi-temporal memory
Contract retrieval

Finance

Expenses + OCR
Purchase orders
Budgets · cost items
Zoho Books integration
Accountant workflow

Library

Sample catalogue
Storage locations
Pull-to-project
QR labels

People & vendors

Vendor registration
Certifications
Staff · timesheets
Zoho CRM sync

Intelligence

Next-best-action
Prompt-learning loop
AI usage dashboard

Not a prototype. All of the above is in production, on one Rails monolith, built over roughly two months.

PART THREE

Forge & OpenClaw

Forge isn't a third-party integration bolted on the side. It's an AI agent that runs locally and lives inside the same ecosystem — sharing one machine surface, the OpenClaw bridge, with the studio's own channels.

Forge · part of the same system

Not an integration — a resident

Forge is an AI agent that lives alongside the studio rather than calling in from outside. It wears three hats — all over the very same OpenClaw API the studio's own WhatsApp and Telegram channels run on.

01 · ops agent

Works inside the data

Watches the channels, drafts and creates records, issues and approvals, and runs tasks against True Build OS — a teammate operating directly on project state.

02 · comms brain

Routes the inbox

Pre-classifies inbound WhatsApp and Telegram, resolves which project a message belongs to, and decides intent before anything is written down.

03 · personal assistant

Beyond the studio

Engineering support on the codebase itself — much of this system — plus day-to-day help. The same agent, a broader remit.

Because Forge speaks the same OpenClaw API as every other channel, everything it does is held to one trust boundary — the one shown two slides on.

Forge · the bridge

An LLM brain between chat and the system

chat

WhatsApp / Telegram

via the wacli bridge

brain

OpenClaw

pre-classify · resolve project · decide intent (+ Forge)

api · shared-secret auth

OpenClaw API on Rails

read + full-CRUD write surface

system

True Build OS

replies queued → OpenClaw polls /outbound → back to chat

Architecturally: chat platforms speak to OpenClaw (the intelligence), OpenClaw speaks to a narrow authenticated API on True Build. The studio's data model never has to know what a WhatsApp message is.

Forge · local-first, hybrid

Local models on a Mac Studio, Claude when it counts

Runs on the Mac Studio — Apple Silicon, always-on, on the studio's own network. The same machine that runs CI.
Local stack: Ollama, Apple MLX, and LM Studio / llama.cpp, serving open-weight models (Llama · Qwen · Mistral-class).
Routing weighs three things at once — difficulty (cheap, high-volume work stays local), privacy (sensitive client data never leaves the machine), and cost (local-first to dodge API spend and rate limits).
Claude online is reserved for the hard reasoning and final-quality generation where it clearly wins.

forge · mac studio

Always-on, on-network

router

Weighs difficulty · privacy · cost

local-first

Ollama · MLX · LM Studio

open-weight · on-device · private

escalate

Claude (online)

hard reasoning · final quality

The same hybrid instinct as the core app — Haiku for fast triage, Sonnet for the heavy lifting — pushed one step further onto local hardware.

Forge · integration & trust boundary

A real agent, on a short leash

Forge is a genuine external AI agent with live read/write access to the studio — which makes the trust boundary the interesting engineering, not the connection itself.

Shared-secret header on every call; the bridge holds one scoped credential, not a staff login.
Server-side project isolation — when an item is linked to an inbound message, the lookup is scoped to that message's project, so a token-holder cannot attach project A's work to project B's conversation.
Provenance on every write — items created this way are stamped "via Forge — {sender}", so machine-origin work is always distinguishable from human-origin.
Outbound is pull, not push — TrueOS queues replies; Forge polls. The system never has to trust an inbound socket.

read surface

dashboard · projects · action items · issues · approvals · expenses · vendors · staff

write surface

action items · issues (+ photos) · records · notes · client approvals · expenses

+ the human layer

Beyond the integration, Forge provides ongoing engineering support and operational help directly — a working collaboration, not just an API client.

This is the same OpenClaw API that backs the WhatsApp and Telegram channels shown in Part One — Forge is its most capable consumer.

The biggest surprise so far

The best interface turned out to be a conversation

We spent two months building polished portals. The most-used — and frankly most enjoyable — way to put data into the system has turned out to be none of them. It's messaging.

Telegram, and especially Forge, became the way entries actually get made. Talking to Forge is now my number-one way of putting things into the system.
Why it works: as an agent talking to True Build OS through its MCP, Forge understands messy, half-formed input first — cleans it, fills the gaps, structures it — before anything touches the database. It feels like handing work to a capable colleague, not filling in a form.
The honest part: Forge is unreliable in other respects. But on the "understand first, then write" path through the MCP, it's been genuinely powerful.
The question it opens: should the system be chat-first — chat as the primary surface, with the interface dynamically assembling the right widget to show information, or to ask for exactly what it's missing?

ForgeTrue Build OS · MCP

marble cracked on the L3 feature wall — Stone & Co to fix before Fri

Which project did you mean?

Mazaya Villa · MZK001Sky Penthouse · SIN003

New issue · draft

ProjectMZK001 · Mazaya Villa

TypeDefect · High

AssigneeStone & Co

DueFri 14 Mar

↳ I'll raise an NCR if it isn't acknowledged by Thursday.

EditConfirm

A natural sentence in; the system asks for the one thing it's missing, then renders a structured draft to confirm. This is the direction I'd most value your read on.

Open questions

Where we're looking next

retrieval

RAG has no provenance check

Answers cite sources alongside, but claims aren't verified back against the retrieved text. No re-ranking, no hybrid keyword+semantic.

graph

Relations don't cascade

Objects can link (blocks, supersedes, remediates) but there's no automatic propagation — it's manual today.

orchestration

No escalation engine

Snooze defers; staleness is detected by scan. There's no time-based auto-escalation or SLA timer.

identity

Naive entity disambiguation

People are matched by substring — "Sam" can collide with "Samira". No glossary / canonical entity resolution yet.

consistency

Cost sync is eventually-consistent

Budget roll-ups can briefly diverge if a mutation fails mid-transaction. No reconciliation job yet.

questions I'm sitting with

Different angles welcome

Tuning the auto-execution threshold · an evaluation harness for the AI jobs · whether the event spine should become the one universal log.

These are the threads I'd most enjoy thinking through with someone who's seen more systems than I have.

Where to next

So — what would you change?

That's True Build OS as it stands today. I'd value your read on it — especially ways to push it further, and different angles on the problems and the solutions.

Systemtruebuild.ae

Integrated withForge · OpenClaw

Built in~ two months

ExploringA chat-first interface