Google’s Gemini Emissions, an Exercise in Statistical Gaming and Manipulation
Google’s recent environmental disclosure report for its Gemini AI service presents an overall picture of Gemini as a service being highly efficient, resulting in low emissions, low water usage and low energy consumption. A full podcast style analysis that was completed using Google’s own Gemini service alongside their tool NotebookLM is available here.
However, our initial review reveals that the report omits critical information, lacks methodological transparency, and fails to meet the standards expected of a credible sustainability disclosure in a regulated or enterprise context. Key concerns include:
Material omissions:
Training emissions are wholly excluded
It provides no cumulative energy or water figures
It excludes any breakdown by task, workload type, or deployment region.
Over-reliance on averages: It presents only median per-prompt values, which significantly understate the environmental cost of high-complexity or large-scale use cases.
Use of market-based accounting: Carbon emissions are reported using energy procurement certificates rather than real-time, location-based grid data, materially underestimating actual operational impact.
Lack of auditability: There is no published methodology, no disclosure of assumptions, and no independent verification of the data presented.
Dilution of Scope 3 emissions: Hardware-related emissions are included only as amortised averages per prompt, with no traceability to lifecycle data, component turnover, or vendor-supplied LCAs.
While the figures shown in the report could not be claimed to be inherently wrong or inaccurate, they are best described as being heavily shaped by extremely selective boundary-setting, averaging methods, and framing choices that significantly mask real-world conditions and usage of their services. As a result, the report provides limited value for sustainability benchmarking, ESG reporting, procurement assessments, or regulatory engagement.
More broadly, if disclosures of this kind become the industry norm, they risk undermining environmental accountability across the AI sector. Without standardised boundaries, transparent methodologies, and auditable data, claims of AI sustainability will remain difficult to trust, and easy to misuse.
Overall, the Google report should be treated as a discussion document and most certainly not a definitive environmental analysis containing metrics that can be used to shape decisions. It highlights the urgent need for more rigorous, standardised, and independently validated emissions reporting frameworks in AI and cloud infrastructure.
1. Anchored to the Least Demanding Use Case
The entire report is centred on the "median Gemini Apps text prompt" as the reference scenario for emissions, energy, and water. There is no disclosure of what this entails – no input length, no output size, no token count, no task complexity.
The median is a statistical midpoint that excludes the top 50% of compute-heavy prompts.
There is no distributional data – no reporting on mean, 90th percentile, or maximum energy cases.
There is no segmentation by workload type: no distinction between image generation, batch querying, RAG, or coding tasks.
This approach deliberately ignores the tail of usage patterns where infrastructure intensity and emissions peak. For a system where power-law behaviour is well established, this framing underplays the true environmental load.
2. No Disclosure of Total Environmental Footprint
The report provides no figures on total energy consumed, total water withdrawn, or total Scope 1, 2 or 3 emissions related to Gemini or Google’s AI estate.
No indication of how fast usage is growing.
No prompt volume or demand trajectory.
No cumulative view of environmental burden.
Efficiency per prompt is meaningless in isolation if total scale increases significantly. Google avoids this entirely.
3. Carbon Reporting Relies on Market-Based Accounting Only
Scope 2 emissions are calculated using market-based accounting – a method that applies renewable energy credits and procurement agreements to reduce reported carbon figures. But this:
Says nothing about actual grid mix at time of use;
Masks regional variation;
Omits time-of-day or demand peaks.
Google’s market-based carbon intensity is reported as 94 gCO2e/kWh. Using a location-based figure would raise this to 345 gCO2e/kWh. Notably, Google’s own Cloud Carbon Footprint tool allows users to see both. The Gemini report chooses only the lower.
4. Hardware Scope 3 Emissions Lack Traceability
Google claims to account for hardware emissions (TPUs, CPUs, DRAM) by amortising embodied carbon over billions of prompts. However:
No disclosure of refresh rates, failure, or decommissioning;
No data on reuse, repair, or end-of-life treatment;
No component-level transparency.
Worse still, vendors such as NVIDIA have not published LCAs for their current-generation AI hardware (e.g. H100). Google’s figures are therefore speculative. Scope 3 values appear low not because the impact is minimal, but because the methodology is opaque.
5. Water Usage Metrics are Misleading
The report claims 0.26 mL of water is used per median prompt. This is positioned as "five drops." The comparison is made against GPT-3 figures from 2023 (~45–50 mL), but:
Google’s figure includes only onsite use;
GPT-3’s figure includes both onsite and offsite (electricity generation);
Google's data is a fleet-wide global average; GPT-3's was location-specific.
Recent independent analysis indicates Gemini’s 2024 water use exceeds GPT-3 in at least 8 of 18 benchmark locations. If Google’s claimed 33x efficiency gains are correct, 2023 use would have been significantly higher. These figures are not disclosed.
6. Claimed Efficiency Gains Lack Transparency
Google states that energy and emissions per prompt have improved by 33x and 44x respectively in one year. However:
No baseline is provided;
No measurement methodology is disclosed;
No audit or verification has been conducted.
Improvements are attributed to speculative decoding, batching, and internal hardware changes. None are described in technical detail. These claims cannot be validated.
7. Training Emissions Excluded Entirely
The report excludes the energy and emissions associated with model training, fine-tuning, and updates. There is:
No accounting for foundation model training;
No treatment of region-specific model variants;
No lifecycle analysis of model churn.
Training is often more resource-intensive than inference. Its exclusion materially distorts the overall footprint.
8. No Treatment of Rebound Effect or Usage Growth
Efficiency gains are presented without acknowledging the rebound effect. There is no:
Disclosure of rising user volumes;
Forecast of scaling infrastructure demand;
Recognition that falling per-unit costs often drive increased consumption.
This omission allows per-prompt gains to be interpreted as system-wide progress, when the opposite may be true.
9. Reporting Averages Flatten Operational Risk
All data is presented as 12-month rolling global averages. This masks:
Seasonal cooling variation;
Grid volatility;
Peak-load stress periods;
Location-specific inefficiencies.
No site-level PUE or WUE is disclosed. No high-water mark metrics are provided. Averages create a perception of stability that does not exist.
10. No Task or Model-Level Segmentation
Energy, emissions, and water data is not broken down by workload type. The report provides:
No comparison between chat, code, summarisation, or multimodal tasks;
No data per model family;
No insight into differential cost per task.
Without this, organisations cannot prioritise optimisation efforts or account for carbon by use case.
11. Methodology Is Not Disclosed or Auditable
There is no technical appendix, methodology paper, or reproducible dataset.
System boundaries are undefined;
Measurement tools and assumptions are unspecified;
No independent review or validation is referenced.
The report fails to meet basic transparency standards expected in regulated sustainability disclosures.
Conclusion: A Reputational Asset, Not an Environmental Report
This is not a robust emissions disclosure. It is a communications document, carefully shaped to minimise scrutiny and present a curated narrative of progress. It omits what matters, averages away what is inconvenient, and excludes the most resource-intensive elements of AI. The danger is not just in what’s missing, it's that these numbers will be recycled and cited without qualification. They will appear in ESG dashboards, vendor assessments, and regulatory consultations as if they reflect real-world conditions.
Organisations should demand full-stack visibility, traceable data, lifecycle emissions, and workload segmentation. Until that becomes standard, reports like this should be treated with caution, not as credible models of environmental performance.