AVAILABLE FOR WORK·SUWANEE, GA
15Y+ IN & AROUND AZURELAST SHIP THIS WEEK
Work / GCPS Monitoring
● LIVE Enterprise monitoring · current engagement

The dashboard that watches a school district's cloud.

A full-stack TypeScript & SQL Server platform for Azure Local / Azure Stack HCI. It pulls operational truth from several very different planes — report snapshots, live WMI faults, Windows events, Azure alerts — and turns them into one operator surface across five environments, without ever pretending a number came from somewhere it didn't.

TypeScript React 19 SQL Server PowerShell Prisma Azure Tailwind v4
TanStack StartTanStack QueryZodMotion
5
environments: prod, UAT, QA, dev, sandbox
152
Azure Local clusters in the latest snapshot
1,309
virtual machines tracked
5M+
live Windows events (rolling window)
103
SQL views doing real shaping
gcps-monitoring · fleet overview
Azure Local operations dashboard — fleet overview, cluster health, live alerts, and an events trend
// The problem

The data was scattered across planes that don't agree.

Monitoring Azure Local (renamed from Azure Stack HCI) isn't one feed — it's many, and they have different truth models. Some data comes from scheduled health reports. Some is a live WMI fault that fires and clears in seconds. Some is a firehose of forwarded Windows events. Some is an Azure API you poll. And it all lives across five domains where non-production environments can't always reach production services directly.

The lazy version flattens everything into one table and fakes the joins. This one doesn't. Report-backed data is tied to a real report; live data stays live and reportless. That single decision is what makes the dashboard trustworthy instead of just busy.

// Architecture

PowerShell gathers evidence. SQL Server does the thinking.

Data shaping lives at the database and repository boundary — never as ad-hoc reads from a component.

01 COLLECT
PowerShell collectors & listeners + Azure APIs + Windows Event Forwarding gather raw JSON, WMI events, event XML, and alert payloads — across all five domains.
02 LAND
SQL Server 2022 stores raw snapshots, live event streams, and alert lifecycle rows. Triggers and stored procedures materialize raw JSON into purpose-built tables.
03 SHAPE
103 views, 43 stored procedures, functions & triggers compute durable dashboard contracts and the full alert lifecycle — in the database, not guessed in the UI.
04 CONTRACT
Prisma + repositories + Zod schemas expose narrow, validated API contracts. Types are inferred from the schemas, never hand-duplicated.
05 RENDER
TanStack Start + React 19 + TanStack Query drive a desktop-first operator UI — with missing data shown as missing, never invented.
// How the data is captured

Seven sources, each with its own honesty.

Report-backed sources carry a real report ID. Live sources are environment-scoped and intentionally reportless.

HCI Fleet Health

REPORT-BACKED

Cluster, node, storage, network, VM, update & service-health snapshots collected inside each environment's domain context.

every 3–4 hours · the broadest source

AVHDX orphan scan

REPORT-BACKED

Finds orphaned differencing-disk files across HCI storage paths; SQL derives age, staleness, and per-server totals.

daily · per-environment scripts

HCI Health Fault Listener

LIVE

Watches the Health Service WMI class for fault create / modify / resolve events, including synthetic stale-close handling.

continuous node-local listener

Forwarded Windows Events

LIVE

WEF subscriptions feed a watcher with catch-up + live phases, SHA-256 dedupe, and a rolling retention window over 5M+ rows.

continuous · flush every 10s

Azure Monitor Alerts

LIVE

Polls the Alerts Management API via OAuth2 and upserts history with SQL MERGE — active, resolved, and recent windows.

every 5 minutes

App & pipeline telemetry

LIVE

The platform monitors itself: server/client errors, Prisma slow queries, and materialization step logs feed a pipeline view.

continuous / runtime
// The interesting part

Evidence-first: keep the two truth models apart.

The whole design hinges on never letting a live signal pretend it came from a report.

Report-backed

  • Has a real ReportDocument + ReportId
  • You can ask: which report produced this row?
  • Which environment, was it valid, what timestamp?
  • What raw collector JSON backs the evidence?
  • HCI Fleet Health & AVHDX live here

Live / reportless

  • Environment-scoped, never report-scoped
  • No synthetic ReportId is ever invented
  • Latest-report enrichment stays nullable, never identity
  • WMI faults, Windows events, Azure alerts
  • Prevents "this fault came from the latest report" lies
alert lifecycle — computed in SQL, not the UI
-- A canonical key defines alert identity; an issue key defines an occurrence.
MERGE dbo.FleetIssue AS tgt
USING staged_alerts AS src
  ON tgt.IssueKey = src.IssueKey
WHEN MATCHED AND src.MonitorCondition = 'Resolved'
  THEN UPDATE SET tgt.Status = 'Resolved', tgt.ResolvedAtUtc = src.SeenUtc
WHEN NOT MATCHED
  THEN INSERT (CanonicalAlertKey, IssueKey, EnvironmentCode, Status)
       VALUES (src.CanonicalAlertKey, src.IssueKey, src.EnvironmentCode, 'Active');
-- FleetEvent then records every fired / changed / resolved transition.
// Inside the platform

The Forge and an AI assistant, built in.

  • The Forge — a governed automation runner inside the dashboard that lets operators launch approved PowerShell workflows: SQL-backed job queues, command manifests, runner heartbeats, artifacts, cancellation, and a full audit trail. No arbitrary shell from a browser. See the full breakdown →
  • Ask ALDHI — the platform's own AI chat, on TanStack AI + OpenRouter, with 38 typed tools that surface its logging and telemetry directly: SQL health, alert triage, Windows event search and XML, materialization logs, application errors, Azure Resource Health, and artifact generation. Hard boundaries — no raw SQL from the model, no exposed secrets, row caps, guarded writes.
  • Built like a product — TypeScript end to end, Zod-validated contracts, live DB contract tests, component tests, Playwright, and lint/typecheck/coverage gates. SQL source is checked in as views, procs, functions, and triggers.
// In the UI

A few more places screenshots will land.

Designed frames so it reads intentionally now, and richer the moment real captures go in.