How To?

So, About Those New Data Halls: How AI Demand Is Rewriting Data Center Plans

The hallway conversation that made me look twice

So there I was, leaning against the cold metal of a server rack while a facilities manager told me, almost sheepishly, that they were adding an entire building for GPUs. Not a room. Not a row. An entire building. Ever had that moment when something you believed was big suddenly becomes bigger in your mind? That was mine. We used to treat AI like an app feature or a fancy demo in a keynote. Now it’s a reason to pour concrete, lay fiber, and negotiate with utilities for more megawatts. And it’s not just the mega cloud folks. Smaller regional players, savvy enterprises, even research labs are all sketching expansions on whiteboards the way we used to plan version upgrades.

Here’s the thing: AI doesn’t just want more compute; it wants a different shape of compute. Training loves dense clusters that drink power and prefer being left alone for days. Inference can be more polite, but it has mood swings when traffic is spiky. In this post, I’ll walk you through why data centers are expanding because of AI, what is changing inside those walls, how power and cooling went from footnote to headline, and the practical choices that will make or break your next build. I’ll share a few stories from the trenches, some real-world gotchas, and the quiet details that matter when the servers hum and the graphs start climbing.

AI isn’t just more servers; it’s different servers

In my experience, the first mistake people make is thinking AI growth is just more of the same. It’s not. Think of it like replacing a fleet of compact cars with heavy-duty trucks. The trucks still drive on the same roads, sure, but they stress the bridges, need wider parking spots, and burn fuel very differently. AI training nodes are those trucks. They demand high-density power per rack, serious cooling, and fast east-west networking that doesn’t blink when thousands of GPU-to-GPU conversations happen at once.

Inference is a different animal. It’s the storefront where customers walk in and expect instant service. When your app suddenly adds a chat assistant, image generation, or semantic search, you’re moving from periodic compute to real-time responsiveness. That means low latency, thoughtful autoscaling, and a fabric that doesn’t choke when half the internet shows up on a Tuesday afternoon. It also means your load balancers, TLS termination, and health checks need to behave like adults. If you’ve never set up clean L4/L7 routing for a bursty, stateful API, give a read to zero‑downtime HAProxy load balancing for clean TLS passthrough and smart health checks. That mindset translates directly to AI inference gateways.

One of my clients learned this the hilarious way. They overbuilt training capacity and underbuilt their edge inference tier. The result? Gorgeous model checkpoints, frustrated users. We rebalance that by treating training like a factory shift and inference like a storefront. The factory can be remote, power-hungry, and methodical. The storefront has to be close to your customers, predictable, and able to survive a weekend promo without calling you at 3 a.m.

Power and cooling: from background utilities to design stars

On paper, power and cooling are simple: bring more in, take more heat out. In real build-outs, they become a dance with utilities, physics, and sometimes zoning boards. High-density racks tip the balance quickly. It’s not unusual to see deployments that make traditional air-cooled rooms sweat. That’s why everyone is suddenly very friendly with liquid cooling, whether that’s rear-door heat exchangers or direct-to-chip loops. If you want a solid baseline of what practical looks like, the ASHRAE thermal guidelines for liquid and air cooling are a helpful compass. You don’t need to become a thermodynamics expert; you just need to appreciate the constraints before you choose a path you’ll live with for a decade.

Cooling isn’t just a technology choice; it’s an operational personality test. Air is simpler to maintain but runs out of headroom faster. Liquid is a commitment, but it keeps dense racks sane without constantly chasing hot spots. I remember a retrofit where we swapped in rear-door units onto an older row. Overnight, the facility went from nervous fan noise to a calm, steady hush. The difference wasn’t subtle. It felt like switching from a hair dryer to a heat pump.

Then there’s the power side. Utility lead times aren’t measured in days, and not all regions can hand you more megawatts without a long conversation. That’s why you’ll see operators carving out dedicated power corridors and planning staged expansions. They pre-wire for tomorrow’s capacity and light up only what they need today. It’s the closest thing to having a teenager-sized jacket your kid can grow into without tripping over the sleeves on day one.

Networking and storage: the fast lanes behind the stage

If power and cooling are the lungs and heart, networking is the nervous system that makes AI clusters feel like one brain. Training pushes east-west traffic through spines and leaves you with a simple truth: the fabric needs to be boringly reliable and fast. You don’t want excitement here. You want it to route and recover like a stoic. I’ve seen teams debate fabrics the same way musicians argue over strings. The key move is to respect latency and jitter the way you respect uptime. Both matter, and both are friendlier when you keep your design consistent.

On the storage side, training loves bandwidth and parallelism. Pulling terabytes of data from object storage feels great until you realize your dataset fetch is arguing with your model checkpoint writes. You can make both happier by tuning your network stack end-to-end. For a grounding in the non-dramatic stuff that helps under real load, I like pointing folks to a calm guide to Linux TCP tuning for high‑traffic services. The examples might be web apps, but the principles carry over: queues, buffers, and a healthy respect for the round trip.

Inference is more storage-light but more sensitivity-heavy. You need weight distribution to be crisp and artifact shipping to be predictable. I once watched a team shave seconds off cold starts simply by cleaning up the artifact pipeline and pinning specific layers closer to the accelerator hosts. No new hardware. Just fewer detours.

And don’t sleep on metadata stores, feature stores, and the boring databases that sit behind your AI endpoints. They’re part of the experience your users feel. If your reads spike, your DB pool will be the first to throw a tantrum. That’s where techniques like read/write split and connection pooling with ProxySQL still pay off. AI doesn’t suspend the old rules of application architecture. It just makes the consequences of ignoring them arrive faster.

The layout is changing: racks, rooms, and how we build

When AI rolls in, your data center layout learns new tricks. Aisle containment gets stricter. You create zones for high-density pods so you’re not mixing GPU furnaces with gentle database racks. Cable trays feel heavier, the overhead becomes more precious, and suddenly the copper versus fiber choice is about heat and weight as much as signal. I’ve seen operators standardize on OCP-style racks because they make power delivery and serviceability cleaner. If that’s a rabbit hole you’re curious about, the Open Compute Project designs are worth a look just to appreciate what’s now considered normal.

We’re also seeing modularity return as a theme. Not the blink-and-you-miss-it containers that tried to be everything to everyone, but thoughtful modular build-outs that stage cooling and power along with compute. It’s like adding rooms to the house without tearing the whole place down. You do your rough-ins for liquid cooling, make room for extra chillers, and keep the inter-row networking under control so the next pod doesn’t force a full redesign.

Then there’s the fiber story, both inside and outside. Inside, you’re meticulous about lengths and loss. Outside, you develop an appreciation for routes, peering, and physical diversity. If you’ve never traced a cable map and suddenly need to place an AI edge near end users, have a browse of the global submarine cable map and you’ll see why certain cities keep showing up in plans. Latency isn’t just numbers; it’s geography with opinions.

Edge and latency: where the model meets the moment

Here’s a fun conversation I had with a product team. They wanted real-time personalization in their app, but their inference cluster lived a timezone away. The results were good, but the experience felt… mushy. We piloted a small edge deployment with slimmer GPUs and a caching layer for embeddings and popular prompts. Users stopped noticing latency, and support tickets about slowness quietly disappeared.

Edge isn’t a religion; it’s a tool. Put the heavy training where power and cooling are affordable. Put the quick-response inference where your customers are impatient. And make peace with the fact that you’ll be living in a hybrid world for a while. The trick is building a pipeline that lets you ship models out and roll them back without drama. Healthy blue-green patterns for AI aren’t that different from the old days of API deploys. Your gateway and your observability become the adults in the room, catching anomalies before they become phone calls.

Speaking of gateways, you want your front door to be calm even when the party is wild. Catch spikes early, shed load gracefully, and keep TLS behavior consistent as you scale. If you need a refresher on why good balancing saves weekends, the write-up on HAProxy with health checks and TLS passthrough stays relevant in the AI world. The actors have changed, but the script is familiar.

Data safety and the boring work that saves your Monday

It’s exciting to talk about GPUs. It’s less exciting to talk about backups. But I’ve had two separate mornings saved because we took model checkpoints and datasets seriously. One team lost a storage shelf, the other had a messy ACL mishap. In both cases, object storage with versioning and a proper retention policy turned a disaster into a coffee break. AI data isn’t just big; it’s expensive to recreate and painful to recrawl. If you haven’t built this muscle yet, get familiar with ransomware‑proof backups with S3 Object Lock and real restore drills. Yes, it’s a different context than AI, but the guardrails are the same: test restores like you mean it, and automate the boring parts.

Security, too, shifts tone with AI workloads. Your attack surface grows with more services, more artifacts, more credentials floating around CI pipelines. I’ve seen teams get clever with isolated build networks and signatures for model artifacts. Nothing fancy, just an understanding that the more valuable the output, the more interesting it is to attackers. Minimizing blast radius and keeping secrets sane isn’t optional. The fun part is how crisp your posture feels when you pair strong observability with a tight deployment process.

Sustainability isn’t a slogan when the bill shows up

Let’s talk about the elephant in the machine room: energy. AI makes us stare at efficiency with newfound honesty. PUE is back in the conversation, but it’s joined by utilization curves, workload scheduling, and renewable contracts. I’ve watched teams squeeze 20% more useful work out of the same hardware by improving queue discipline and picking smarter times to run certain jobs. The cheapest watt is the one you never burn, and the second-cheapest is the one you pull when carbon intensity is low.

Cooling efficiency tricks feel simple until you quantify them. Tighten containment, reduce bypass, and your chillers get to breathe. Liquid cooling shifts your curve dramatically, but it asks for maturity and planning. What matters most is feedback loops. You make a change, you measure, you keep the wins, and you scrap the noise. Over a year, that turns into real money and a quieter conscience. Teams that treat facilities and software like one system win big here.

The planning playbook: what to decide and when

When someone asks me how to plan a data center expansion for AI, I resist the urge to throw a checklist at them. Instead, I ask three questions. What is your mix of training and inference for the next 12 months? How close do you need to be to your users? And what is your real power envelope, not the dream one? Those answers shape almost everything else.

If training is dominant, favor dense pods, serious cooling, and a fabric you won’t outgrow in six months. If inference is dominant, invest in edge presence, smart gateways, and good CI/CD for models. If you’re split, keep your architecture modular enough that you can grow each side independently. In any case, build in the ability to dry-run moves. That means you can re-seat a pod, swap cooling loops, or change a routing strategy without feeling like you’re defusing a bomb.

Now, here’s a small confession. I used to underplay the role of good old-fashioned capacity planning. AI cured me. We started forecasting training jobs like a construction schedule and inference like peak store hours. It made everyone calmer. The network team knew when to expect heat. The storage team prepared for checkpoint storms. The facilities crew kept the cooling curve steady. Less heroics, more repeatable wins.

People, process, and the culture that keeps lights green

The technology gets headlines, but the culture decides whether your expansion feels like a parade or a migraine. The best-run AI clusters I’ve seen share a few quiet habits. They document daily. They let facilities and platform engineering talk like close cousins. They hold small blameless reviews after changes, not just after outages. And they automate the dull parts so the team has energy for the tricky bits.

One team I admire runs tiny weekly drills: a simulated node failure here, a fake cooling alarm there. It’s not theater. It’s how they keep muscle memory alive. When an actual link flaps or a PDU goes sulky, they’re bored, not panicked. That confidence seeps into everything, from procurement to placement strategies.

Tooling that actually helps (and none of the drama)

There’s a temptation to buy your way into good practice. Tools help, but only when they serve the plan. For networking under real load, start with the fundamentals and only then dive into accelerations. And yes, the low-level tweaks are still worth understanding. If you’ve ever been burned by a tiny kernel default, you’ll appreciate the sanity of tuning the TCP stack with intention, even if your packets now carry embeddings instead of HTML.

For data layers, connection discipline and query hygiene beat exotic contraptions nine times out of ten. I’ve watched teams regain stability by using ProxySQL for read/write split and efficient pooling behind an AI inference microservice that loved to stampede the database. The fix wasn’t glamorous, but the graphs got boring, and that’s what you want.

And for safety nets, your backup strategy should be able to restore not just files, but also a working posture under time pressure. There’s a world of difference between having a copy and being able to use it. Practicing restores, as in that S3 Object Lock playbook with real drills, is the grown-up version of hope is not a strategy.

Cost, contracts, and the unavoidable reality check

Let’s level with each other: AI hardware is expensive, and the grid power that feeds it isn’t getting cheaper. The win comes from aligning spend with value. I’ve seen organizations rush into long commitments for the wrong tier. They locked in the inference edge and then discovered that their training needs shifted to a different region with better power economics. A better move is to keep the training side flexible, scale inference close to users, and revisit both quarterly with fresh usage traces.

On the vendor front, treat every component like part of a system. Cooling isn’t separate from racks, which aren’t separate from PDUs, which aren’t separate from your deployment cadence. If any one piece forces your hand, you’ll pay for it in complexity. The best contracts I’ve seen include options, not just capacity. The right to plug in liquid cooling later. The freedom to add another feed. The ability to expand a pod without asking for a full-room rework. Options are oxygen.

What changes for developers and product teams

If you’re building the apps people touch, here’s your part in the story. You control the shape of the demand that hits the data center. Small architectural choices ripple all the way to power bills. Cache smartly. Reuse embeddings. Batch where the user won’t feel it. Timeout with kindness. And for your deploys, treat model versions like API versions with a safety rope. Canary the new thing. Measure carefully. Roll back without blame.

And don’t be shy about asking your platform team how to be a good citizen. A simple question like, what time of day is friendliest for heavy batch jobs, can save more energy than you think. When teams coordinate, you get the feeling of a single organism that breathes in sync. That feeling is addictive in the best way.

The quiet backbone: practices that age well

I’ll leave you with a few patterns that have aged well in AI expansions. Keep your blast radius small with composable pods. Treat observability like part of the product, not an afterthought. Separate concerns: the gateway does gateway things, the fabric does fabric things, the training cluster trains, and you don’t try to turn everything into a Swiss army knife. Be predictable, not clever, at the edges where failures like to hide. And try to make your graphs boring. Boring is the north star.

Whenever someone pitches a wild new design, I ask them to walk me through failure modes first. If we can answer those calmly, we usually end up with something buildable. If not, we simplify until we can. It’s not a killjoy approach. It’s how you keep the hum of a thriving data center feeling like the calm of a library rather than the chaos of a concert.

Wrapping up: build for the sprint, plan for the marathon

We’ve covered a lot of ground, so let’s stitch it together. AI is pushing data centers to grow up fast, but not just in size. The shape is changing. Higher density. Smarter cooling. Faster fabrics. Edge presence where it counts. If you treat training and inference like different customers, you’ll make clearer design choices and avoid the trap of one-size-fits-nobody builds.

Start by mapping your next year of workloads, then shape your pods, power, and network accordingly. Keep options in your contracts. Practice the unglamorous things like restores and failovers. Borrow battle-tested patterns from the web world when they fit, like sane load balancing at L4/L7 and gentle but effective TCP tuning. And never forget the data: checkpoints, datasets, and the old devops wisdom that a backup you haven’t tested might as well not exist. That’s where practiced, enforceable backups earn their keep.

AI is a sprint inside a marathon. Build with the sprint in mind, plan with the marathon in view, and keep your team’s energy for the decisions only people can make. Hope this was helpful. See you in the next post, and if you’re standing in a chilly data hall right now, I’m right there with you in spirit.

Frequently Asked Questions

Great question. It depends on your rack density and growth curve. If you’re running modest GPUs with room to spread out, well-tuned air with tight aisle containment can carry you a while. Once density climbs or you want to future‑proof for bigger accelerators, liquid pays off with stability and headroom. I like to rough‑in for liquid even if I don’t turn it on day one, so I’m not boxed in later.

Sometimes, but not always. Training wants cheap, abundant power and cooling; inference wants to be near users for snappy responses. Many teams keep training centralized in a power‑friendly region and push slimmer inference pods toward the edge. The key is a model pipeline that ships versions cleanly and can roll back without drama.

Start with the boring wins. Improve artifact shipping and caching, trim cold starts, and tidy your network path. Make sure your health checks and balancing aren’t adding jitter. Techniques like clean L4/L7 routing and a tuned TCP stack often shave the rough edges off latency. Only after that should you reach for new GPUs.