Back to GDELT Cloud
Last updated May 22, 2026

Methodology

GDELT Cloud structures and enriches public event data so APIs, dashboards, and agents can use it without re-deriving the pipeline. This page explains where the data comes from, how often it updates, how we process it, and where it should not be the sole source.

Source universe

GDELT Cloud's primary raw inputs are the public GDELT 2.0 datasets: Events, the Global Knowledge Graph (GKG), and the Global Entity Graph (GEG). These are sourced from Google BigQuery, which hosts the official GDELT public datasets.

Energy infrastructure data is sourced directly from Global Energy Monitor's published registries (coal plants, renewable capacity, pipelines, LNG terminals, nuclear, and related categories) and loaded into the gem schema. We obtain GEM data offline, separately from the GDELT ingest pipeline.

GDELT Cloud does not directly observe world events. We structure and enrich what reporters, public registries, and the GDELT Project's automated systems have already published.

Update frequency

The main ingest runs hourly on the top of the hour. Supporting jobs reconcile entity links, cluster membership, and GEG entity coverage on the same cadence, with a daily and weekly entity reconciliation pass that widens the window.

Typical end-to-end latency from GDELT publication to availability in GDELT Cloud is about 15 minutes โ€” the duration of the ingest pipeline. Variable upstream lag in GEG entity coverage can extend this for some entity-linked fields. Real-time freshness for each pipeline stage is published on the data status page.

Event construction

Article-level signals are embedded and entity-extracted, then assembled into a daily cluster catalog. A sorting step assigns each new article to a cluster (or starts a new one) within the same UTC ingest day.

Clusters are then coded in parallel: ACLED-style conflict coding produces conflict events, and CAMEO+ multi-domain coding produces structured events across ten domains (political, crime, economic, corporate, technology, infrastructure, environment, health, demographic, and information). Labels are generated last.

Taxonomy and scoring

GDELT Cloud uses two event taxonomies: ACLED-aligned conflict event types (political violence, protests, riots, strategic developments) and a CAMEO+ taxonomy that extends CAMEO across ten domain categories.

Scoring fields include the Goldstein scale (-10 to +10, populated for political and conflict events), magnitude (0-10, domain-specific severity), and the systemic_importance, propagation_potential, and market_sensitivity scores. Quad class and event-root codes are inherited from the underlying CAMEO/CAMEO+ structure.

Story scope and clustering

Each story (cluster) carries a scope: local, national, regional, or international, derived from the geographic and source distribution of articles in the cluster.

Clustering operates on a UTC ingest-day window. Articles published on different days do not merge into the same cluster today, even when they describe the same ongoing incident. Cross-day continuity is on the roadmap and is flagged in the known limitations section below.

Known limitations

Historical coverage is strongest from March 2026 forward. Earlier history is being backfilled but is incomplete; queries that span pre-March 2026 dates may return uneven coverage.

Source bias is real. GDELT's source universe over-indexes English-language and globally indexed outlets relative to local-language reporting, and some regions are systematically under-covered. GDELT Cloud inherits this bias.

Classification and deduplication are imperfect. CAMEO+ and ACLED coding rely on automated systems and may misclassify edge cases. Cross-cluster duplicates are flagged via a duplicate field but are not always perfectly resolved.

Cluster labels are generated by language models and consistency is not guaranteed across days or topics.

The story_count field is a cardinality of distinct stories within the queried window; it is not a sum across dates. Aggregating story_count across multiple dates without deduplication will overcount.

Appropriate use

GDELT Cloud is built for monitoring, triage, alerting, research acceleration, signal discovery, country and sector trend analysis, and agent workflows that benefit from structured event data with clean schema, entity links, and category coverage.

Not appropriate as the sole source for

Legal determinations, sanctions screening, emergency-response decisions, sole-source investment decisions, individual-level risk decisions, or any claim requiring evidentiary certainty.

Treat GDELT Cloud output as analytical signal, not ground truth. High-impact conclusions should be corroborated with primary sources, official statements, or trusted reporting.

Schema and classifier changes

We may update schemas, classifiers, scoring logic, deduplication, clustering, and enrichment methods over time. Material changes that affect integrators are documented in the changelog. Where feasible we preserve backward-compatible fields and provide migration notes.

Methodology questions?

For deeper questions about data lineage, schema changes, or appropriate use, email us.