How it works
Verified primary-source research at scale.
Most analysis you read is a paraphrase of a paraphrase, with citations to press releases and analyst memory. Our research is built on a programmatically-verified corpus of operator language coupled to a deterministic quant signal. Here's the engine.
Axis 1
Primary-source operator corpus
We harvest the full library of theCUBE event interviews — long-form, on-the-record conversations with the people who actually build and buy AI infrastructure. CEOs, EVPs, principal engineers. The transcript for every interview is normalized, speaker-attributed, and persisted with stable identifiers we control.
Each speaker is keyed by a stable UUID that survives company changes, title changes, and event boundaries. That's the foundation of the cross-event identity layer (Axis 4 below).
Transitioning from scrape adapter to direct database access. Downstream consumers (research pipeline, API surface, dashboards) are built behind the harvester abstraction so the source change is invisible.
Axis 2
Programmatic citation verification
Every quote in every research piece is verified against the source transcript before publication. Three-tier matching:
1. Exact
Normalized quote is a verbatim substring of a single transcript turn.
2. Spanning
Quote spans 2-5 consecutive turns (handles cases where the LLM stitches across speaker / sentence boundaries).
3. Fuzzy
Token overlap ≥ 70% with some turn. Last resort — flagged for review when it triggers.
When the LLM cannot extract a clean verbatim quote but the operator clearly discussed the topic, we cite the theCUBE video itself as a primary source rather than fabricate a quote. Two citation types: type: 'quote' and type: 'video'.
Verification status renders on every citation as a ✓ Verified or ⚠ Unverified badge. Unverified citations are explicitly excluded from the machine-readable JSON-LD citation graph.
Axis 3
Alpha-coupled quant signals
We track 100+ AI / cloud / semiconductor companies in a deterministic scoring engine that updates every 15 minutes. The Alpha Score is built from news flow, momentum, fundamentals, valuation, innovation velocity, and social-narrative signal — all explainable, no black boxes.
Every research piece that mentions a tracked ticker links to the live score. Every cited operator whose company is tracked links to the company page. The reverse direction works too: every company page shows which research pieces have cited that company's executives.
For the full Alpha Score derivation — formula, components, signal health, known limitations — see the Alpha Score methodology page.
Every external data source feeding the quant signal — SEC EDGAR, FRED, EIA, USAspending, US Census trade, GPU spot pricing, TSMC monthly revenue, WattTime grid carbon — is listed on the live data sources page with a health probe. If a feed is degraded or down, you see it before we do.
Axis 4
Cross-event speaker identity
Stable speaker UUIDs let us answer queries no other research firm can: how has X's framing of Y evolved across N appearances? Who from a given company is most vocal about hardware roadmap? What did this exec say three months ago that is now contradicted by what they said yesterday?
The alumni graph is internal infrastructure today, exposed via /people/[speakerId] pages and the /api/platform/speakers endpoint. As the corpus depth compounds, position-evolution analysis becomes a primary research deliverable.
For AI consumers
Machine-readable by design
Every research page emits Schema.org JSON-LD with full Article and Quotation entities. Speaker metadata is structured Person with jobTitle and worksFor. Source URLs reference VideoObject entries pointing to the original theCUBE interview.
When ChatGPT, Claude, Perplexity, or any citation-aware AI assistant answers a question about AI infrastructure, our structured-data graph is what they index against. Verified citations only — unverified quotes are downgraded out of the machine-readable graph entirely.
In one sentence
We built an analyst pipeline whose every claim is attributable to a named human on the record at a specific event, programmatically verified, coupled to a live quant signal, and emitted in a format AI consumers prefer.