Future Cognitive Capital

Future Cognitive Capital

AI As An Investment

Chapter I — COMPUTE & SYSTEMS

AAAI

Hidden Market Gems's avatar
Hidden Market Gems
Dec 15, 2025
∙ Paid
woman in black top using Surface laptop
Photo by Christina @ wocintechchat.com on Unsplash
Welcome to this first edition of AI As An Investment.
I am super excited to present you this file today.
I advice you to read this edition is a calm room with silence.
If you want to get the most of it, you may try to be in the best conditions.
A coffee or a tea might be a good fit, but a glass of wine too. 
You might want a note to be open on the side so you can write down some tickers.
This is the first chapter among nine. I spent so much time doing it. I hope you will enjoy. 

1. Chapter I — COMPUTE & SYSTEMS

(The Physical Engine of Intelligence)

1.0.1 Companies features in this article

Compute & hyperscalers
NVDA, AMD, QCOM, GOOGL, AMZN, MSFT

Foundries / manufacturing
TSM, 005930.KS, GFS

Packaging / assembly
AMKR, ASX

Memory (HBM / DRAM)
000660.KS, MU, 005930.KS

Networking / interconnect
AVGO, MRVL, ANET, CSCO

Power, cooling, electrification
SU.PA, ETN, ABB, SIE.DE, GEV, VRT, JCI

Utilities / grids
NEE, DUK, D, EDF.PA, IBE.MC, ENEL.MI, NG.L

Data center operators / real estate
EQIX, DLR

System integrators (servers)
DELL, HPE, IBM, 0992.HK

1.1 Introduction

Power → GPU → HBM → Network → Servers → Datacenter → Tokens → Revenue

1.1.1 Role of Compute in the AI Economy1

Compute has rapidly become the critical physical foundation underpinning the global AI economy. According to the 2025 Stanford AI Index Report, compute investment in AI infrastructure is outpacing spending in many traditional industries, contributing over $33.9 billion in private investment globally in 2024 alone, with a projected increase of nearly 19% year-on-year. Renaissance Macro Research’s analysis highlights (August 2025), AI and data center investments drove 92% of U.S. GDP growth in the first half of 2025. From this study, this figure represents an unprecedented concentration of economic growth in infrastructure investment. Further reinforcing this finding, from Harvard economist Jason Furman’s analysis, “investment in information processing equipment and software is 4 percent of GDP, but it was responsible for 92 percent of GDP growth in H1 2025.”

Beyond fueling model training, AI compute serves as a bottleneck resource whose availability and performance dictate the pace of AI innovation and application scaling. This transforms compute capacity into a sovereign infrastructure asset, drawing in massive capital from both hyperscalers and sovereign wealth funds.​

What these numbers tell you is simple but crucial: AI has stopped being a pure software story and has become a physical infrastructure story. Economic growth is no longer driven by clever algorithms alone, but by the ability to build and operate massive computing systems. When a small slice of GDP explains almost all growth, it signals that capital is rushing into a single constraint. Today, that constraint is compute: chips, memory, networking, data centers, and the power that feeds them.

In practice, this means AI adoption is not limited by ideas, talent, or demand. It is limited by capacity. If there are not enough GPUs, models cannot be trained. If there is not enough memory bandwidth, those GPUs sit idle. If data centers cannot secure power or cooling, deployment timelines slip by months or years. Companies like NVIDIA and AMD address the raw compute problem by supplying accelerators. SK hynix, Micron, and Samsung solve the memory bottleneck that determines how fast those chips can actually run. Broadcom, Marvell, Arista, and Cisco deal with the networking problem that appears once clusters grow large. Equinix and Digital Realty solve the “where do we physically put all this hardware” problem by operating data centers ready for high-density AI workloads.

This is why compute is now treated as strategic infrastructure. Hyperscalers invest tens of billions to secure capacity, governments worry about sovereignty, and sovereign wealth funds allocate capital to the physical backbone of AI rather than to applications. For an investor, the takeaway is not to chase the best AI demo. It is to understand which physical problem is binding at a given moment, and which companies are paid to solve it. When growth concentrates around a hard constraint, value concentrates around the firms that remove that constraint.

1.1.2 From Energy to Compute: The Physical Conversion

AI computation fundamentally relies on converting electrical energy into mathematical operations. This conversion chain is characterized by high energy intensity: recent estimates suggest training a single large transformer model can consume as much energy as several hundred U.S. households’ annual consumption. The chain involves feeding power to GPUs or specialized accelerators where electrical energy is translated into billions of floating-point operations per second (FLOPs).​

Based on technical analysis and academic research, GPT-3’s training consumed approximately 1,287 MWh of electricity. From this study, this energy consumption is equivalent to roughly 120 U.S. households’ annual electricity consumption, confirming the estimate of “several hundred households.” The training of GPT-3’s 175 billion parameters also generated 552 metric tons of CO2 emissions.

Advanced generation GPUs such as Nvidia’s H100 achieve petaflop-scale performance using tens of megawatts per cluster, necessitating careful energy infrastructure planning. Further, each bit’s movement through memory hierarchies and interconnects consumes incremental power, often creating secondary constraints through thermal dissipation needs and energy costs.​

Think of AI compute as a factory that turns electricity into “intelligence output”. Every prompt you run, every model you train, is ultimately a conversion process: you pay for power, that power drives chips, the chips do math, and most of that energy ends up as heat that must be removed. That is why big AI is so energy-hungry. The GPT-3 example is there to anchor the scale: training one large model can burn the kind of electricity a small neighborhood uses over a year, and it produces real emissions. The key point for a beginner is this: performance is not just “how smart the model is”, it’s “how many useful operations you can buy per unit of energy and per unit of time”.

Now translate that into the real world. When you hear “H100 clusters need tens of megawatts”, it means companies are no longer buying servers, they’re buying mini power plants plus cooling systems. NVIDIA (H100) solves the “do the math fast” problem, but that immediately creates two new bottlenecks: delivering stable power and removing heat. That’s where Schneider Electric, Eaton, and ABB show up (power distribution, UPS, electrical gear), and where Vertiv becomes critical (cooling and power systems for high-density data centers). If the power is constrained, or the cooling is inadequate, the GPUs don’t run at full utilization. And if the GPUs aren’t fully utilized, the economics break: you’re sitting on extremely expensive hardware that can’t produce tokens efficiently.

1.1.3 Structural Industrial Constraints

The AI compute industry faces multiple structural limitations. Semiconductor manufacturing is bounded by process node shrinking limits (reticle size, yield constraints), pushing chipmakers toward complex packaging and integration to continue performance gains. Lead times for fabricating newest generation chips stretch to over 24 months, causing supply-demand imbalances sensitive to geopolitical disruptions.

NVIDIA’s supply chain partnerships with foundries, shows the tape-out (design finalization) process takes approximately 6 months, with production scheduled to begin in 2026, suggesting an overall cycle of 18-24 months or more. From recent reporting on NVIDIA’s current supply constraints, lead times have stretched beyond 9 months for some customers. From comprehensive supply chain analysis, the cumulative timeline from design through production to delivery spans 12-24 months depending on the specific process node and product maturity.

Especially with critical supply concentrated in a handful of foundries like TSMC and Samsung.​

TSMC’s technology roadmap and market position, the company has dominated the 3nm FinFET manufacturing space since late 2022, with subsequent node families expanded through 2024-2025, and 2nm (N2) entering mass production by late 2025. From Samsung’s competitive response, the company deployed 3nm GAAFET technology via MBCFET design by 2022. From market capacity analysis, CoWoS (Chip-on-Wafer-on-Substrate) packaging capacity owned predominantly by TSMC has been fully booked through 2025, creating severe bottlenecks for AI chip production.

Thermal constraints cap the power consumption chips can sustainably handle, necessitating costly cooling solutions (including refroidissement liquide) and complicating data center design. Thermal engineering analysis shows, individual H100 GPUs can consume up to 700W at peak performance, creating thermal management challenges beyond traditional air cooling. From detailed data center cooling case studies, liquid cooling (including immersion and direct-to-chip approaches) reduces Power Usage Effectiveness (PUE) to levels as low as 1.03-1.1, compared to 1.5 or higher for air-cooled facilities. From Vertiv’s research, direct-to-chip liquid cooling reduces facility power consumption by 18.1% and total data center power by 10.2%. From emerging best practices, leading facilities are achieving PUE below 1.0 through advanced liquid cooling and waste heat recovery systems.

Furthermore, system-level integration challenges across silicon, memory, network, and power delivery require finely tuned coordination, limiting agility in scaling compute capacity.​

This section explains why AI compute cannot scale quickly, even when demand is exploding. Making advanced chips is not like flipping a switch. Once a design is finalized, it can take one to two years before real volumes are delivered. That long cycle creates chronic mismatches between demand and supply. When demand suddenly accelerates, shortages are almost guaranteed. And because most advanced production is concentrated in a few players, small disruptions can ripple through the entire AI ecosystem.

In practical terms, companies like NVIDIA design the chips, but they depend on TSMC and Samsung to physically manufacture them. Those foundries have hard physical limits: yields, reticle size, and the sheer complexity of advanced nodes. To keep improving performance, the industry has shifted toward advanced packaging, especially technologies like CoWoS. The problem is that packaging capacity is even more constrained than chip fabrication itself. When CoWoS lines are fully booked, it doesn’t matter how strong demand is. Output is capped. This is why packaging specialists and foundries quietly gain pricing power during AI booms.

Thermal limits add a second layer of constraint. Modern GPUs can draw hundreds of watts each, which turns data centers into extreme heat environments. Traditional air cooling simply cannot cope at scale. This is where liquid cooling becomes necessary, not optional. Companies like Vertiv solve this problem by enabling high-density racks to operate safely and efficiently. For investors, the logic is straightforward: when chips become more powerful, the bottleneck shifts to heat and integration. Value then flows to whoever can remove heat, manage power, and coordinate the entire system. The deeper the constraints, the stronger the moat around the companies that can solve them.

1.2 System Overview

1.2.1 General Architecture of the Compute Chain

Modern AI compute systems are structured as hierarchies of interconnected components optimized to process massive parallel workloads. At the lowest physical level, silicon compute units (GPUs, TPUs) connect to high-bandwidth memory packages (HBM3/4), themselves integrated via sophisticated packaging technologies to minimize latency and energy usage.​

System-level analysis of AI infrastructure architecture documented across multiple disciplines, tell us this hierarchical organization is validated through actual hyperscale deployments and technical specifications. From NVIDIA’s cluster documentation and industry case studies, the described architecture represents current best practices in distributed AI compute systems.

Compute clusters aggregate thousands of these chip packages into server nodes, networked with high-speed interconnect technologies like InfiniBand or Ethernet, designed to facilitate data-intensive machine learning workloads. These nodes in turn scale to large hyperscale data centers where compute density reaches hundreds of megawatts, with infrastructure to support physical power and thermal constraints.​

Think of an AI compute system as a layered machine, where each layer exists to keep the one above it busy. At the bottom are the chips that do the math, such as GPUs or TPUs. On their own, these chips are useless. They must be fed with data at very high speed, which is why they are tightly coupled with High Bandwidth Memory (HBM). If memory is too slow, the chip waits, burns power, and produces no useful output. Advanced packaging exists for one reason: to reduce the physical distance between compute and memory so less energy is wasted moving data.

Once you stack thousands of these chip-and-memory units together, a new problem appears: coordination. Servers have to talk to each other constantly during training and inference. That is why high-speed networking becomes essential. Technologies like InfiniBand or high-performance Ethernet allow these servers to behave like one giant machine rather than thousands of isolated boxes. Companies such as NVIDIA design reference architectures that tightly integrate compute, memory, and networking to make this work in practice.

At the top of the stack sit hyperscale data centers. This is where everything becomes physical again. Packing enough compute into one place means dealing with enormous power draw and heat. Operators like Equinix and Digital Realty solve the problem of hosting these systems at scale by providing facilities designed for high-density racks, advanced cooling, and reliable power delivery. For an investor, the key insight is that AI performance is not determined by a single component. It emerges from the weakest link in this chain. Understanding how these layers fit together tells you where bottlenecks will form and which companies are paid to remove them.

1.2.2 Interdependence Between Silicon, Memory, Network, Cooling

Performance and scalability are critically dependent on tight integration across silicon process nodes, memory subsystem bandwidth, network latency, and advanced thermal management.

High Bandwidth Memory (HBM) generations 3 and 4 underpin many AI systems by stacking multiple DRAM layers in a 3D configuration, supporting speeds over 1.2 TB/s in 2025

Networking components such as DPUs (Data Processing Units), NVLink, and CXL standards enable efficient data movement between nodes and reduce bottlenecks caused by latency and bandwidth limitations. Cooling systems, often moving towards liquid cooling, are necessary due to the power density of modern servers, given chips consuming over 500 watts individually at peak.​

The documentation of High Bandwidth Memory standards and JEDEC specifications, shows HBM3 supports maximum bandwidth of approximately 819 GB/s per stack, while HBM3E variants reach up to 1.17 TB/s. From JEDEC’s official HBM4 standard released in April 2025, the specification supports up to 2 TB/s total bandwidth through 2048-bit interfaces operating at speeds up to 8 Gb/s. From SK Hynix’s product announcements, the company delivered the world’s first HBM4 samples in 2025.

For example, the rise of 3D chip stacking using CoWoS and Foveros packaging enables HBM memory to be vertically integrated on logic dies, increasing bandwidth but raising cooling complexity.​

AnySilicon’s technical documentation of CoWoS (Chip-on-Wafer-on-Substrate) technology highlights, this packaging approach enables 2.5D and 3D stacking of multiple semiconductor components, reducing interconnect delays significantly.

From Intel’s Foveros technical documentation, this advanced packaging technology enables logic-on-logic stacking using Through-Silicon Vias (TSVs) and micro-bump interconnects to achieve tight vertical integration. From semiconductor packaging analysis, both technologies provide measurable benefits in reduced latency and improved bandwidth efficiency compared to traditional interconnects.

We know thanks to NVIDIA’s official product specifications for NVLink interconnect, the 3rd generation achieved 600 GB/s per GPU, the 4th generation (H100) delivers 900 GB/s per connection with approximately 100 nanosecond point-to-point latency, and the 5th generation (Blackwell) will achieve 1,800 GB/s bandwidth. From Varidata’s technical comparison of interconnect standards, CXL 3.0 enables cache-coherent communication with 32-64 GB/s bandwidth per link. From Data Processing Unit technical analysis, DPUs function to relieve CPU workload by handling network and storage processing functions.

Why AI performance is not about having “the best chip”, but about making many components work together without wasting time or energy?

A fast GPU alone does nothing if it is waiting on memory, stalled by the network, or throttled by heat. Real-world AI systems fail or succeed on integration. The goal is to keep the chip busy every second it is powered on. That is the only way the economics make sense.

High Bandwidth Memory (HBM) exists for one reason: feeding GPUs fast enough. AI workloads move enormous volumes of data. Traditional memory cannot keep up. By stacking memory vertically and placing it extremely close to the compute die, HBM reduces distance, latency, and energy loss. Companies like SK hynix, Micron, and Samsung Electronics solve this problem by supplying HBM. If HBM supply is tight, the most advanced GPUs cannot reach their advertised performance. That is why memory often becomes the real bottleneck, not compute.

Once many GPUs work together, data must move between them constantly. This is where networking and interconnects matter. Technologies like NVLink (from NVIDIA), DPUs, and standards like CXL exist to reduce the “coordination tax” of large clusters. Without them, GPUs spend time waiting instead of computing. Firms such as Broadcom, Marvell, Arista Networks, and Cisco are paid to solve this invisible problem: moving data fast enough so that expensive compute is not wasted.

Advanced packaging, such as CoWoS (used heavily by TSMC) or Foveros (developed by Intel), is the glue that ties compute and memory together. It boosts performance by shortening connections, but it also concentrates heat. This immediately creates a thermal problem. Chips drawing 500–700 watts cannot be cooled with air at scale. Liquid cooling becomes mandatory, not optional. This is where companies like Vertiv step in, enabling high-density systems to operate continuously without thermal throttling.

For an investor, the takeaway is simple: AI performance comes from coordination, not hero components. Whenever one part of this chain improves, pressure shifts to the next weakest link. Memory, interconnects, packaging, and cooling all become monetizable bottlenecks at different moments. The winners are the companies that remove friction from this chain and allow compute to run flat-out, hour after hour.

1.2.3 The Concept of “Chain Bottleneck”

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Hidden Market Gems · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture