True Build OS
System Architecture · Technical Review

True Build OS

The operating system behind the interior design studio.

A walkthrough of the ingest pipeline, the AI extraction and orchestration layer, the decision systems, and the portal architecture — from first field input to final logged decision.

Prepared forTechnical Architecture Review
ScopeIngest · Orchestration · Decisions · Surfaces
StackRails 7 · PostgreSQL + pgvector · Claude · Hotwire
Live attruebuild.ae
What this covers

A working studio, run as software

True Build OS runs a working interior design studio end-to-end: how information comes in from the field, how it becomes structured records, how work gets routed, and how decisions get made and logged.

  • Everything here is built and in production — every claim maps to real code, not roadmap.
  • It follows one path: ingest → orchestration → decisions → surfaces, then the wider ecosystem it plugs into.
  • I'd genuinely value your read on it — especially different ways to think about these problems, and where it could go next.
In production
6

distinct portals / surfaces, multiple identity models

AI as plumbing
~15

background jobs call Claude for extraction, classification, suggestion

Ingest channels
6

WhatsApp · Telegram · email · voice · web/QR · contracts

Knowledge index
7

entity types embedded into pgvector for semantic + temporal recall

System at a glance

One monolith, five layers, many mouths to feed

01 · Ingest channels
WhatsApp · Telegram · Email · Voice · Web/QR · Contracts
02 · AI normalization
Claude structures every input
DocumentGeneration · classifiers · extractors · OCR
03 · Core domain — Project is the root
Records · Issues · Action items · Approvals · Decisions · FF&E · Budget
All on one append-only event spine
04a · Orchestration
Next-Best-Action · Ball-in-Court
04b · Knowledge
pgvector RAG · bi-temporal
05 · Surfaces
Backoffice · Client · Vendor · Consultant · Accountant
+ Anthropic · OpenAI · Zoho · Postmark · Forge/OpenClaw
01 · Ingest channels WhatsAppvia OpenClaw · wacli Telegramvoice · photo · text EmailActionMailbox · Postmark Voice (Plaud)site dictation Web · QRforms · site capture Contract PDFschunked for retrieval 02 · AI normalization  —  Claude (Haiku / Sonnet) + structured tool-use DocumentGenerationJob Multimodal classifier Action-item extractor Receipt OCR Severity classifier 03 · Core domain  —  Project is the root aggregate Project Records Issues Action Items Client Approvals Decisions / Q's FF&E POs · Expenses · Budget Event spine — immutable append-only log · every state change emits a verb 04a · Orchestration Next-Best-Actionagentic suggest → auto-execute Ball-in-Court queueunified across 6 entity types 04b · Knowledge pgvector RAG7 entity types · OpenAI embeds Bi-temporal queryreconstruct state as-of any time 05 · Surfaces Backofficestaff · password Client portalpasswordless Vendor portalpasswordless Consultantscoped role Accountantexpenses Integrations Anthropic · OpenAI · Zoho CRM · Postmark · Backblaze B2 Forge / OpenClaw bridge → external AI agent
Rails monolith · token-is-auth public surfaces (approvals, NCRs, site capture) sit outside the five authenticated portals. The whole thing is one deployable.
The three load-bearing ideas

What the architecture actually commits to

01 · Aggregate

Project is the root

Every record, issue, task, decision, cost and approval hangs off one Project. There is a single place to ask "what is the state of this job?" — no cross-system reconciliation.

02 · Event-driven

State change is a fact

Mutations emit verbs onto an append-only event spine. The spine is the trigger surface: suggestions, notifications and embeddings all react to events rather than being wired into business logic.

03 · AI as plumbing

Inference is infrastructure

Claude is not a chat feature bolted on the side. It sits inside ~15 background jobs doing extraction, classification and routing — with structured tool-use, logging and graceful fallback.

The bet: in a studio, work is information-driven (a transcript becomes tasks) rather than process-driven (step 1 must finish before step 2). So the system is an event-and-extraction engine, deliberately not a BPMN workflow tool.

PART ONE

The ingest pipeline

How messy, real-world input from a building site becomes clean, structured, queryable records — usually without anyone filling in a form.

Ingest · overview

Six channels in, one structured spine out

WhatsAppOpenClaw · wacli Telegramvoice·photo·text EmailActionMailbox VoicePlaud Web · QRsite capture ContractsPDF → chunks True Build OS CLAUDE STRUCTURING Records Issues Decisions
WhatsAppOpenClaw · wacli Telegramvoice · photo · text EmailActionMailbox · Postmark VoicePlaud Web · QRon-site capture ContractsPDF → chunks True Build OS CLAUDE STRUCTURING Typed records Issues · actions Decisions · Q's
bind once

A chat is tied to a project via /init MZK001 — then every message routes itself.

async

Webhooks verify and return fast; heavy work runs in retried Solid Queue jobs.

provenance

Every record keeps its input_source and a link back to the exact message.

Voice intake is moving onto the Plaud MCP. Claude does the structuring; OpenAI embeddings do the indexing — split by job.
Ingest · the non-obvious part

Multimodal intake with a "quiet window"

in
Photo + voice note + caption
arrive seconds apart, out of order
collect
Intake session — a "quiet window"
artifacts accumulate; the timer resets on each new one
classify
One multimodal Claude call
image(s) + text → 4 buckets + confidence
confirm
"Reply Yes to log"
→ a typed record: Issue · Action · Decision
A photo + a voice note + a caption arrive in a bound site group — seconds apart, out of order 📷 Photo 🎙 Voice ✍ Caption Intake Session status: collecting artifacts accumulate quiet_until = now + 2m ↳ resets on each new artifact Transcribe + downloadasync, in parallel, retried Finalization fires when quiet window expires OR all artifacts ready — race-safe claim Single multimodal call image(s)+text → Claude Haiku classify into 4 buckets + confidence + draft fields Confirm in chat"Reply Yes to log" — human gate Typed record createdIssue · Action · Decision · Feedback Why it matters → Field input is bursty and multimodal. The quiet window stitches a burst into one event; one classification, not three. End-to-end: ~4–6 seconds, no forms.
The WhatsApp vendor-onboarding path uses the same spirit in reverse: an LLM-driven state machine (greeting → company → trades → certs → contact → confirm) that accumulates a payload across turns and emits a VendorRegistration on completion.
Ingest → structure

One tool-use call that does a day's admin

DocumentGenerationJob takes an unstructured transcript or email and returns a single structured object that the app commits in one transaction.

  • Classifies into one of 11 record types (minutes, RFI, PCN, site report, spec, defect…).
  • Emits a formatted document plus the side-effects: new action items, updates to existing items by ID, and closures with a resolution.
  • Also extracts structured decisions, open questions and change requests — each with a source quote for traceability.
  • It learns the studio's house style — when staff approve or edit a generated document, that final version is saved. Next time the same record type is generated for that project, a few of those approved documents are pasted into the prompt as worked examples, so the output matches how this studio actually writes. No fine-tuning, no retraining.
  • Advisory-lock serialized per project so two concurrent transcripts can't both mutate the same action item.
Tool-use schema · abridged
{
  record_type:        "site_visit_report",
  summary:            "…",
  decisions_structured:  [ {statement, decided_by,
                           source_quote} ],
  open_questions:        [ {question, raised_by} ],
  new_action_items:      [ {description, assignee,
                           due_date, task_type} ],
  action_item_updates:   [ {id, status, next_step} ],
  action_item_closures:  [ {id, resolution} ],
  change_requests:       [ {description, cost_impact} ]
}

Falls back to raw-JSON parsing if tool-use is unavailable. Every call is logged to AiUsageLog with tokens + cost.

The non-obvious bit isn't extraction — it's that the model reconciles against existing state (update / close by ID) instead of always creating new rows. That's what keeps the action-item ledger continuous.
Draft refinement

Point at what's wrong

The hard part of AI editing isn't generating text — it's telling the model which part to change without it quietly rewriting everything else. The studio's answer borrows from how people already mark up documents.

  • Highlight any span in a generated record and leave a comment — "too formal", "wrong supplier", "merge these two". Like Google-Docs comments, except they regenerate.
  • A general-instruction box handles document-wide asks — "make the tone more formal, fix the dates".
  • Each comment is scoped to its section and its exact quoted text, so the model knows precisely where you mean — no guessing from a vague prompt.
Minutes · Site visit · 12 Mar · MZK001

Walkthrough of Level 3 with the client. The contractor confirmed the marble will arrive next week; tiling to follow once the substrate is signed off.

Client approved the brushed-brass ironmongery. Electrical position on the island to move 300 mm east before second fix.

"contractor confirmed the marble will arrive next week"
It's Stone & Co specifically, and the date is the 19th — fix both.
CancelAdd comment
On submit, Sonnet 4.6 receives every section at once with a forced apply_revisions tool — annotated sections must come back changed, not echoed.
Draft refinement

Get a redline back, not a rewrite

  • Word-level diff (LCS) renders the result as tracked changes — insertions in green, deletions struck through. You see exactly what moved.
  • Accept or reject each change, or all at once; the preview stays editable for a manual touch-up before it lands.
  • Consistency propagation — rename a supplier in one comment and it's corrected in every section, not just the line you marked. This is the part that's genuinely hard to get right.
  • Nothing is persisted — annotations are a transient instruction, not data. The approved version then feeds the house-style learning loop.
Summary · annotation
"contractor confirmed the marble will arrive next week"
It's Stone & Co specifically, and the date is the 19th.
3 CHANGES · 2 SECTIONS Reject all Accept all

The contractorStone & Co confirmed the marble will arrive next weekon the 19th; tiling to follow once the substrate is signed off.

↳ consistency: "contractor" → "Stone & Co" also updated in Decisions and Action Items
The whole loop is server-rendered: comment → background job → tracked-changes preview streamed back over Turbo → accept → committed. No SPA, no editor framework.
PART TWO

Orchestration & decisions

Once everything is a structured fact on a shared event spine, the system can route work, suggest the next move, and capture decisions with an audit trail.

Orchestration · routing

"Whose move is it?" as a first-class concept

Six different entity types — action items, issues, approvals, decisions, open questions, attention alerts — all share a Ball-in-Court concern: a polymorphic pointer to whoever currently owns the next move, plus a snooze.

  • A single query assembles one unified queue per person across all six types.
  • A shared urgency score ranks them: overdue +100, severity, staleness, age-of-pending-approval, due-soon.
  • The ball shifts explicitly (staff → vendor → client) and every shift is an event.

Most tools give each object type its own backlog. Here the operational question — what is waiting on me, ranked — is answered in one place regardless of object type.

six sources
Action items · Issues · Approvals · Decisions · Open questions · Alerts
one ranking
Unified urgency score
overdue +100 · severity · staleness · age
my queue
One ranked list of what's waiting on you
SIX SOURCES · ONE QUEUE Action Items Issues Client Approvals Decisions Open Questions Attention Alerts urgencyscore() MY QUEUE ⚑ Overdue NCR+100 Approval · 7d+40 Decision due+20
Orchestration · the agentic core

Next-Best-Action: confidence you can govern

trigger
Event — record generated, approval requested…
Claude (Haiku)
Suggests 1–3 templated actions, each scored 0–1
blend the score
0.8 · model  +  0.2 · history
history = this studio's acceptance rate for that action
≥ 0.85
Auto-execute
below
Propose to staff
↑ accept / reject feeds back into the history
Trigger eventrecord.generated,approval.requested… Claude (Haiku)suggests 1–3 templatedactions, each with ascore 0.0–1.0 + reason Blend the score 0.8·model + 0.2·history history = acceptance rate for this template ≥ 0.85? threshold yes Auto-execute create item · draft approval no Propose to staff queued for human review accept / reject decisions feed back into the historical acceptance rate →
Every suggestion is persisted as a SuggestedAction — even auto-executed ones — so acceptance rates are measurable per template. Low-risk templates auto-fire; the model's confidence alone never gets a vote without the studio's track record beside it.
Decisions · the record of truth

Approvals: zero-login, snapshot-immutable

One polymorphic ClientApproval serves three different approvable things — an FF&E item, a variation order, or a design record. The client never logs in: the secure token is the authentication.

  • Snapshot at send time — the approvable's state is frozen into the approval. If the item later changes, the client's decision still references exactly what they saw.
  • One-shot, 72-hour token, consumed under a row lock — no double-submit, no replay.
  • Approval drives side-effects — an approved FF&E item advances its lifecycle stage; a record flips to approved — emitted as events.
  • Signed proof captured — IP, user-agent, timestamp on the decision.
approvable

FfeItem · VariationOrder · ProjectRecord — one model, three masters

72h
token life
consume
0
logins
flow

draft → snapshot + token → email link → client approves / requests-changes → side-effect + event → confirmation back to requester (and Telegram, if that's where it began)

The same token-is-auth pattern powers NCR acknowledgements, public issue reports, and QR site-capture — external parties act without an account, scoped to exactly one object.
Knowledge layer

Recall across entities — and across time

Seven entity types embed their text into pgvector on write. "Ask Arqis" runs one semantic query spanning all of them, assembles a context window, and has Claude answer with the project's own history.

  • No external vector DB. Embeddings live in the same Postgres as the data — HNSW cosine index, OpenAI text-embedding-3-small.
  • Graceful fallback to recency if the embeddings provider is unavailable.
  • Bi-temporal queries — reconstruct "what were the live decisions as of that date", walking supersession chains. This is the part most RAG stacks don't have.

I'll be candid on the next slide about which of these is genuinely differentiated and which is just 2024 table-stakes.

embedded

Events · Records · Issues · Action Items · Decisions · Open Questions · Contract Chunks

retrieval

single query → 6 entity types → 6k-token context → Claude Haiku answer, logged

bi-temporal

decisions_at(t) · events_at(t) · current_superseding(d) — state as-of any moment, not just "now"

Note: this is the studio's own embedded RAG — distinct from "gbrain", which is a personal developer knowledge tool and is not part of the product.
How the system improves itself

It tunes its own prompts

1
Staff approves / edits a generated document
2 · PromptLearning
Stores the (raw → approved) pair
3 · PromptRefinementJob
Reads recent corrections, asks Claude for a better prompt
4 · PromptSuggestion
Human reviews the diff → updates the template
↑ which changes how the next document is generated
Staff approves /edits a document PromptLearningstores (raw → approved) pair PromptRefinementJobreads recent corrections,asks Claude for a better prompt PromptSuggestionhuman reviews diff,accepts → updates template updated template changes how the next document is generated → which staff approve / edit → … SAME EXEMPLARS, DIFFERENT USEDocumentGeneration injects recent approved exemplars as few-shot at run time

Human approvals are treated as labelled data. They do two jobs: few-shot exemplars at generation time, and fuel for an offline meta-loop that proposes prompt rewrites for a human to accept. Both stay human-gated by design.

Surfaces

Five portals, three ways to be "logged in"

All five are live and in daily use — every portal has at least one active user right now. They sit at different stages of continuous development, but none is a prototype.

/portal

Backoffice

Staff. Queue, projects, records, FF&E grid, library, comms.

/client

Client

Approvals, brief, project story, assistant chat.

/vendor

Vendor

Assigned issues, NCRs, photos, certs.

/consultant

Consultant

Scoped project access, items, expenses.

/accountant

Accountant

Expense triage workflow only.

staff

Sign in with a password. What each person can see and do is scoped to their role — a consultant or accountant can't reach beyond their own remit.

clients & vendors

No password ever. They get a one-time email link that expires in 15 minutes — nothing to set up, nothing to forget.

public

The link is the key. Client approvals, defect sign-offs and on-site QR photo capture need no login — each secure link opens exactly one thing.

One identity record per person, keyed on email, so the same individual can't end up as two separate logins. Staff can step into a client or vendor's view to help them directly.
The interface

Made for the desk and the site

truebuild.ae/portal/queue
True Build
Dashboard
My Queue
Projects
Records
Issues
Library
Vendors
Expenses
My Queue
12 items waiting on you
3Overdue
5Approvals
8Open issues
4Due today
NCR-014 — cracked cladding, Level 3MZK001Overdue
Approve: marble selection, master bathSIN-003Awaiting client
RFI-042 response from structuralMZK001Ready
Site-visit minutes — draft generatedVLA-009Review
Variation order — outlet relocationSIN-003Draft
9:41● ● ●  ▮
My Queue
8 on your plate
✓ Donerelease to clear
MZK001Overdue
NCR-014 — cracked cladding, Level 3
SIN-003Client
Approve: marble selection, master bath
VLA-009Review
Site-visit minutes — draft generated
👆
Queue
Projects
Capture
Me
Representative of the live UI, rendered in the same design system. Mobile triage is swipe-first — clear, snooze, or reassign an item without leaving the queue.
The stack & the choices behind it

Boring infrastructure, on purpose

runtime

Rails 7.1 · Ruby 3.3
single monolith, one deploy

data

PostgreSQL 17 + pgvector
relational + vectors, one DB

async + realtime

Solid Queue · Solid Cable
Postgres-backed, no Redis

front end

Hotwire (Turbo + Stimulus)
server-rendered, no SPA

The throughline

  • Minimise moving parts. No Redis, no external vector service, no Node build step, no SPA. One Postgres is the queue, the cache-bus, the relational store and the vector index.
  • Data sovereignty. Sensitive client and contract data — including embeddings — never leaves the app's own database.
  • The cost: you trade horizontal-scale headroom and specialised tooling for operational simplicity. For a studio's data volumes, that's the right side of the trade — and an explicit one.
AI

Anthropic Claude — Haiku 4.5 (fast triage), Sonnet 4.6 (long-form generation)

embeddings

OpenAI text-embedding-3-small · 1536-dim · HNSW

comms

Postmark (in + out) · OpenClaw/wacli (WhatsApp) · Telegram · Zoho CRM sync

ops

Docker · Heroku (prod) + Coolify (staging) · Sentry · Backblaze B2

Where the effort went

Novelty spent only where it mattered

Genuinely non-obvious

  • Governed agentic execution — blending model confidence with the studio's own historical acceptance rate, with an auto-execute threshold and a measurable feedback loop.
  • Approvals-as-training-data — the same human approvals power both run-time few-shot and an offline prompt-refinement meta-loop.
  • Highlight-to-comment refinement — point at the exact span, say what's wrong; only that regenerates, with the rename propagated across every section.
  • Quiet-window multimodal intake — stitching a bursty voice+photo+text field report into one event and one classification, race-safe.
  • Unified ball-in-court queue — one ranked "whose move" list across six heterogeneous entity types.
  • Bi-temporal project memory — reconstructing live state as-of any past date over an event spine.
  • Snapshot-immutable, login-less approvals as the system of record for client decisions.

Deliberately standard

  • Vanilla RAG — embed → nearest-neighbour → stuff context → answer. No re-ranking, no hybrid search, no citation verification.
  • Receipt OCR via vision model — useful, entirely commodity.
  • Webhook + ActionMailbox ingestion — Rails built-ins; only the routing logic is ours.
  • Single-shot severity classification.
  • Direct API calls with retry — no exotic orchestration framework.
  • Templated prompts — string interpolation, not a prompt DSL.
Deciding which problems deserve a clever solution — and which deserve a boring, proven one — is most of the work.
The whole surface

Everything a studio runs on, in one system

10
module groups
6
intake channels
5
portals
~2 mo
to build

Intake

  • WhatsApp · Telegram
  • Email · Voice (Plaud)
  • Web / QR capture
  • Submissions

Project delivery

  • Projects · stages
  • Milestones
  • FF&E planning grid
  • Kickoff wizard · floor plans

Records & docs

  • Minutes · RFIs · PCNs
  • Site reports · specs
  • Drawings
  • AI document generation

Issues & quality

  • Snagging · defects
  • NCRs
  • Vendor sign-off
  • Photo evidence

Decisions

  • Client approvals
  • Decision ledger
  • Open questions
  • Change → variation orders

Knowledge

  • Ask Arqis (semantic)
  • Bi-temporal memory
  • Contract retrieval

Finance

  • Expenses + OCR
  • Purchase orders
  • Budgets · cost items
  • Zoho Books integration
  • Accountant workflow

Library

  • Sample catalogue
  • Storage locations
  • Pull-to-project
  • QR labels

People & vendors

  • Vendor registration
  • Certifications
  • Staff · timesheets
  • Zoho CRM sync

Intelligence

  • Next-best-action
  • Prompt-learning loop
  • AI usage dashboard
Not a prototype. All of the above is in production, on one Rails monolith, built over roughly two months.
PART THREE

Forge & OpenClaw

Forge isn't a third-party integration bolted on the side. It's an AI agent that runs locally and lives inside the same ecosystem — sharing one machine surface, the OpenClaw bridge, with the studio's own channels.

Forge · part of the same system

Not an integration — a resident

Forge is an AI agent that lives alongside the studio rather than calling in from outside. It wears three hats — all over the very same OpenClaw API the studio's own WhatsApp and Telegram channels run on.

01 · ops agent

Works inside the data

Watches the channels, drafts and creates records, issues and approvals, and runs tasks against True Build OS — a teammate operating directly on project state.

02 · comms brain

Routes the inbox

Pre-classifies inbound WhatsApp and Telegram, resolves which project a message belongs to, and decides intent before anything is written down.

03 · personal assistant

Beyond the studio

Engineering support on the codebase itself — much of this system — plus day-to-day help. The same agent, a broader remit.

Because Forge speaks the same OpenClaw API as every other channel, everything it does is held to one trust boundary — the one shown two slides on.
Forge · the bridge

An LLM brain between chat and the system

chat
WhatsApp / Telegram
via the wacli bridge
brain
OpenClaw
pre-classify · resolve project · decide intent  (+ Forge)
api · shared-secret auth
OpenClaw API on Rails
read + full-CRUD write surface
system
True Build OS
replies queued → OpenClaw polls /outbound → back to chat
WhatsApp /Telegramunofficial WA bridge (wacli) OpenClawLLM "brain" — pre-classify,resolve project, decide intent+ Forge: scheduling, personal support shared-secret auth OpenClaw API/api/openclaw/* on Railsread + full-CRUD write surfacechannels · messages · outbound queue True Build OSprojects · action items · issuesrecords · approvals · expenses replies queued in TrueOS → OpenClaw polls /outbound → back to chat
Architecturally: chat platforms speak to OpenClaw (the intelligence), OpenClaw speaks to a narrow authenticated API on True Build. The studio's data model never has to know what a WhatsApp message is.
Forge · local-first, hybrid

Local models on a Mac Studio, Claude when it counts

  • Runs on the Mac Studio — Apple Silicon, always-on, on the studio's own network. The same machine that runs CI.
  • Local stack: Ollama, Apple MLX, and LM Studio / llama.cpp, serving open-weight models (Llama · Qwen · Mistral-class).
  • Routing weighs three things at once — difficulty (cheap, high-volume work stays local), privacy (sensitive client data never leaves the machine), and cost (local-first to dodge API spend and rate limits).
  • Claude online is reserved for the hard reasoning and final-quality generation where it clearly wins.
forge · mac studio
Always-on, on-network
router
Weighs difficulty · privacy · cost
local-first
Ollama · MLX · LM Studio
open-weight · on-device · private
escalate
Claude (online)
hard reasoning · final quality
ForgeMac Studio Routerweighs difficulty,privacy & cost local-first escalate Local models Ollama · MLX LM Studio / llama.cpp open-weight LLMs on-device · private Claude (online) hard reasoning · final quality
The same hybrid instinct as the core app — Haiku for fast triage, Sonnet for the heavy lifting — pushed one step further onto local hardware.
Forge · integration & trust boundary

A real agent, on a short leash

Forge is a genuine external AI agent with live read/write access to the studio — which makes the trust boundary the interesting engineering, not the connection itself.

  • Shared-secret header on every call; the bridge holds one scoped credential, not a staff login.
  • Server-side project isolation — when an item is linked to an inbound message, the lookup is scoped to that message's project, so a token-holder cannot attach project A's work to project B's conversation.
  • Provenance on every write — items created this way are stamped "via Forge — {sender}", so machine-origin work is always distinguishable from human-origin.
  • Outbound is pull, not push — TrueOS queues replies; Forge polls. The system never has to trust an inbound socket.
read surface

dashboard · projects · action items · issues · approvals · expenses · vendors · staff

write surface

action items · issues (+ photos) · records · notes · client approvals · expenses

+ the human layer

Beyond the integration, Forge provides ongoing engineering support and operational help directly — a working collaboration, not just an API client.

This is the same OpenClaw API that backs the WhatsApp and Telegram channels shown in Part One — Forge is its most capable consumer.
The biggest surprise so far

The best interface turned out to be a conversation

We spent two months building polished portals. The most-used — and frankly most enjoyable — way to put data into the system has turned out to be none of them. It's messaging.

  • Telegram, and especially Forge, became the way entries actually get made. Talking to Forge is now my number-one way of putting things into the system.
  • Why it works: as an agent talking to True Build OS through its MCP, Forge understands messy, half-formed input first — cleans it, fills the gaps, structures it — before anything touches the database. It feels like handing work to a capable colleague, not filling in a form.
  • The honest part: Forge is unreliable in other respects. But on the "understand first, then write" path through the MCP, it's been genuinely powerful.
  • The question it opens: should the system be chat-first — chat as the primary surface, with the interface dynamically assembling the right widget to show information, or to ask for exactly what it's missing?
ForgeTrue Build OS · MCP
marble cracked on the L3 feature wall — Stone & Co to fix before Fri
Which project did you mean?
Mazaya Villa · MZK001Sky Penthouse · SIN003
New issue · draft
ProjectMZK001 · Mazaya Villa
TypeDefect · High
AssigneeStone & Co
DueFri 14 Mar
↳ I'll raise an NCR if it isn't acknowledged by Thursday.
EditConfirm
A natural sentence in; the system asks for the one thing it's missing, then renders a structured draft to confirm. This is the direction I'd most value your read on.
Open questions

Where we're looking next

retrieval

RAG has no provenance check

Answers cite sources alongside, but claims aren't verified back against the retrieved text. No re-ranking, no hybrid keyword+semantic.

graph

Relations don't cascade

Objects can link (blocks, supersedes, remediates) but there's no automatic propagation — it's manual today.

orchestration

No escalation engine

Snooze defers; staleness is detected by scan. There's no time-based auto-escalation or SLA timer.

identity

Naive entity disambiguation

People are matched by substring — "Sam" can collide with "Samira". No glossary / canonical entity resolution yet.

consistency

Cost sync is eventually-consistent

Budget roll-ups can briefly diverge if a mutation fails mid-transaction. No reconciliation job yet.

questions I'm sitting with

Different angles welcome

Tuning the auto-execution threshold · an evaluation harness for the AI jobs · whether the event spine should become the one universal log.

These are the threads I'd most enjoy thinking through with someone who's seen more systems than I have.
Where to next

So — what would you change?

That's True Build OS as it stands today. I'd value your read on it — especially ways to push it further, and different angles on the problems and the solutions.

Systemtruebuild.ae
Integrated withForge · OpenClaw
Built in~ two months
ExploringA chat-first interface
← → or space to move · F fullscreen · P print