Work / GCPS Monitoring

● LIVE Enterprise monitoring · current engagement

The dashboard that watches a school district's cloud.

A full-stack TypeScript & SQL Server platform for Azure Local / Azure Stack HCI. It pulls operational truth from several very different planes — report snapshots, live WMI faults, Windows events, Azure alerts — and turns them into one operator surface across five environments, without ever pretending a number came from somewhere it didn't.

TypeScript

React 19

SQL Server

PowerShell

Prisma

Azure

Tailwind v4

TanStack StartTanStack QueryZodMotion

environments: prod, UAT, QA, dev, sandbox

152

Azure Local clusters in the latest snapshot

1,309

virtual machines tracked

5M+

live Windows events (rolling window)

103

SQL views doing real shaping

gcps-monitoring · fleet overview

// The problem

The data was scattered across planes that don't agree.

Monitoring Azure Local (renamed from Azure Stack HCI) isn't one feed — it's many, and they have different truth models. Some data comes from scheduled health reports. Some is a live WMI fault that fires and clears in seconds. Some is a firehose of forwarded Windows events. Some is an Azure API you poll. And it all lives across five domains where non-production environments can't always reach production services directly.

The lazy version flattens everything into one table and fakes the joins. This one doesn't. Report-backed data is tied to a real report; live data stays live and reportless. That single decision is what makes the dashboard trustworthy instead of just busy.

// Architecture

PowerShell gathers evidence. SQL Server does the thinking.

Data shaping lives at the database and repository boundary — never as ad-hoc reads from a component.

01 COLLECT

PowerShell collectors & listeners + Azure APIs + Windows Event Forwarding gather raw JSON, WMI events, event XML, and alert payloads — across all five domains.

02 LAND

SQL Server 2022 stores raw snapshots, live event streams, and alert lifecycle rows. Triggers and stored procedures materialize raw JSON into purpose-built tables.

03 SHAPE

103 views, 43 stored procedures, functions & triggers compute durable dashboard contracts and the full alert lifecycle — in the database, not guessed in the UI.

04 CONTRACT

Prisma + repositories + Zod schemas expose narrow, validated API contracts. Types are inferred from the schemas, never hand-duplicated.

05 RENDER

TanStack Start + React 19 + TanStack Query drive a desktop-first operator UI — with missing data shown as missing, never invented.

// How the data is captured

Seven sources, each with its own honesty.

Report-backed sources carry a real report ID. Live sources are environment-scoped and intentionally reportless.

HCI Fleet Health

REPORT-BACKED

Cluster, node, storage, network, VM, update & service-health snapshots collected inside each environment's domain context.

every 3–4 hours · the broadest source

AVHDX orphan scan

REPORT-BACKED

Finds orphaned differencing-disk files across HCI storage paths; SQL derives age, staleness, and per-server totals.

daily · per-environment scripts

HCI Health Fault Listener

LIVE

Watches the Health Service WMI class for fault create / modify / resolve events, including synthetic stale-close handling.

continuous node-local listener

Forwarded Windows Events

LIVE

WEF subscriptions feed a watcher with catch-up + live phases, SHA-256 dedupe, and a rolling retention window over 5M+ rows.

continuous · flush every 10s

Azure Monitor Alerts

LIVE

Polls the Alerts Management API via OAuth2 and upserts history with SQL MERGE — active, resolved, and recent windows.

every 5 minutes

App & pipeline telemetry

LIVE

The platform monitors itself: server/client errors, Prisma slow queries, and materialization step logs feed a pipeline view.

continuous / runtime

// The interesting part

Evidence-first: keep the two truth models apart.

The whole design hinges on never letting a live signal pretend it came from a report.

Report-backed

Has a real ReportDocument + ReportId
You can ask: which report produced this row?
Which environment, was it valid, what timestamp?
What raw collector JSON backs the evidence?
HCI Fleet Health & AVHDX live here

Live / reportless

Environment-scoped, never report-scoped
No synthetic ReportId is ever invented
Latest-report enrichment stays nullable, never identity
WMI faults, Windows events, Azure alerts
Prevents "this fault came from the latest report" lies

alert lifecycle — computed in SQL, not the UI

-- A canonical key defines alert identity; an issue key defines an occurrence.
MERGE dbo.FleetIssue AS tgt
USING staged_alerts AS src
  ON tgt.IssueKey = src.IssueKey
WHEN MATCHED AND src.MonitorCondition = 'Resolved'
  THEN UPDATE SET tgt.Status = 'Resolved', tgt.ResolvedAtUtc = src.SeenUtc
WHEN NOT MATCHED
  THEN INSERT (CanonicalAlertKey, IssueKey, EnvironmentCode, Status)
       VALUES (src.CanonicalAlertKey, src.IssueKey, src.EnvironmentCode, 'Active');
-- FleetEvent then records every fired / changed / resolved transition.

// Inside the platform

The Forge and an AI assistant, built in.

The Forge — a governed automation runner inside the dashboard that lets operators launch approved PowerShell workflows: SQL-backed job queues, command manifests, runner heartbeats, artifacts, cancellation, and a full audit trail. No arbitrary shell from a browser. See the full breakdown →
Ask ALDHI — the platform's own AI chat, on TanStack AI + OpenRouter, with 38 typed tools that surface its logging and telemetry directly: SQL health, alert triage, Windows event search and XML, materialization logs, application errors, Azure Resource Health, and artifact generation. Hard boundaries — no raw SQL from the model, no exposed secrets, row caps, guarded writes.
Built like a product — TypeScript end to end, Zod-validated contracts, live DB contract tests, component tests, Playwright, and lint/typecheck/coverage gates. SQL source is checked in as views, procs, functions, and triggers.

// In the UI

A few more places screenshots will land.

Designed frames so it reads intentionally now, and richer the moment real captures go in.

cluster detail

screenshot — cluster detail tabs

live alerts

screenshot — live alerts & WMI faults

event timeline

screenshot — Windows event timeline

the forge

screenshot — Forge runner & artifacts

The data was scattered across planes that don't agree.

PowerShell gathers evidence. SQL Server does the thinking.

Seven sources, each with its own honesty.

HCI Fleet Health

AVHDX orphan scan

HCI Health Fault Listener

Forwarded Windows Events

Azure Monitor Alerts

App & pipeline telemetry

Evidence-first: keep the two truth models apart.

Report-backed

Live / reportless

The Forge and an AI assistant, built in.

A few more places screenshots will land.

Back to the index

The Forge — governed automation