The era when AI responses appeared to materialize from thin air masked a vast, purpose-built physical backbone. This article examines a radical new data center architecture that delivers 44 exaflops of inference capacity and why Canadian tech executives, IT directors, and enterprise architects should care. The facility, designed for speed and resiliency, showcases innovations in processor design, memory integration, cooling systems, and power resiliency. For Canadian tech organizations planning AI deployments, the lessons from this build are immediate, strategic, and actionable.
Table of Contents
- Outline and core thesis
- Why this facility matters to Canadian tech decision makers
- Why Oklahoma City? Site selection lessons for Canadian tech
- Wafer scale engines: a fundamental rethink of processor architecture
- Memory architecture: why on-chip memory is a game changer
- Cooling at scale: liquid cooling and a 6,000-ton chiller plant
- Power strategy: batteries, generators, and near zero downtime
- Measuring compute: why output matters more than input for latency-sensitive AI
- Modular expansion: how the campus scales and what it means for procurement
- Domestic manufacturing and supply chain: why build in the United States matters
- Engineering through adversity: the production story and what it teaches operations teams
- Real-world applications: medicine, education, and the acceleration of innovation
- What Canadian tech executives should do now
- Implications for Canada’s AI ecosystem
- Security, resilience, and regulatory considerations
- Cost and sustainability tradeoffs
- Workforce and skills: who builds and operates these systems
- Competitive positioning: how Canadian tech companies can leverage high-speed inference
- The long view: what comes next in hardware and Canadian tech strategy
- Conclusion: a wake-up call for Canadian tech
- Action checklist for Canadian tech leaders
- Frequently asked questions
- Final thoughts and call to action
Outline and core thesis
- Why location matters: site selection, resilience, and operating cost tradeoffs
- Processor revolution: wafer scale engines and on-chip memory
- Cooling and thermal engineering: liquid systems and chiller design
- Power design and uptime: batteries, generators, and grid strategies
- Manufacturing choices and supply chain implications
- Real-world impact: medicine, education, and high-speed inference
- What Canadian tech firms should do now: strategy, procurement, and partnerships
- FAQ tailored for Canadian tech decision makers
Why this facility matters to Canadian tech decision makers
At a time when computational capacity determines the pace of AI-driven business transformation, the facility under discussion is not just another data center. It represents a shift in how inference compute is measured and marketed. Rather than quoting raw power consumption or facility megawatt capacity, this deployment highlights inference output directly: 44 exaflops of real-world compute. For Canadian tech leaders evaluating vendor claims, that shift in metric is essential. The difference between gigawatt rhetoric and real inference performance changes procurement criteria, benchmarking practices, and vendor selection.
Canadian tech organizations face decisions on where to host mission-critical workloads, how to assess vendor performance claims, and whether to invest in domestic compute infrastructure. The lessons from this build illustrate how a focus on compute output, memory architecture, cooling strategy, and local manufacturing can deliver orders-of-magnitude advantages for latency-sensitive AI applications.
Why Oklahoma City? Site selection lessons for Canadian tech
Location choice is not random. The facility was sited in Oklahoma City for a combination of practical reasons that ring true for any infrastructure decision: labor cost and availability, expandability of the campus, and the cost of power. The building itself is reinforced concrete, engineered to resist local weather patterns. That is a subtle but critical point: natural hazard risk shapes construction standards, insurance economics, and ultimately operating cost and uptime.
For Canadian tech teams, particularly those in the Greater Toronto Area and other urban centers, the site selection calculus is comparable. The factors to weigh include real estate cost, availability of skilled labor, proximity to partners and customers, and the local energy mix. Canadian jurisdictions present attractive options for several reasons: stable regulations, proximity to major enterprise customers, and renewable energy options in some provinces. But the Oklahoma example highlights a practical truth: the best location often balances cost, resilience, and future expansion capacity.
Canadian tech leaders should model site selection decisions on rigorous criteria: what will be the facility’s chief risk exposures, how does the local climate influence build standards, and how will the energy source and resiliency architecture affect workload continuity? These are the same questions addressed by the Oklahoma build, and Canadian teams should treat them as mandatory inputs to any strategy that contemplates on-premises or co-located high-speed AI infrastructure.
Wafer scale engines: a fundamental rethink of processor architecture
The most striking technological departure in this data center is the wafer scale engine. This is not an incremental improvement on GPU architecture. It is a wholesale reimagining of the processor, scaling a single package to the size of a dinner plate. By comparison, a large traditional chip is approximately 750 square millimeters. The wafer scale device measures 46,250 square millimeters. To put it plainly, the wafer is the size of a dinner plate while conventional high-end chips are closer to the size of a postage stamp or a thumbnail.
Why does this matter? There are three principal impacts for performance and for the way enterprises should evaluate AI infrastructure:
- Reduced off-chip latency: The wafer architecture integrates very large volumes of memory directly on the chip. That eliminates the off-chip memory latency that constrains traditional GPU-based inference. The result is dramatically faster access to the working data set.
- Single-package scale: Instead of stitching many small chips together via a motherboard and a network fabric, the wafer provides a monolithic compute surface. For specific AI inference workloads, this reduces communication overhead, simplifies scheduling, and improves predictability.
- Power density and packing efficiency: Each wafer consumes significant power—roughly 18 kilowatts per wafer—but the integrated approach improves end-to-end efficiency for inference throughput.
The company leading this build claims a 2,500 times faster access pattern to on-chip data for inference tasks compared with traditional GPU systems that rely on off-chip memory traffic. That acceleration does not mean GPUs will disappear overnight, but it does force enterprises and cloud architects to reconsider where latency-sensitive models should run. For Canadian tech teams building real-time customer experiences, clinical diagnostic inference, or high-frequency decisioning systems, that difference in data access time translates directly to better response times and lower total cost for production-scale inference.
Memory architecture: why on-chip memory is a game changer
Most modern accelerators balance compute and memory between an on-chip cache and off-chip DRAM. Off-chip memory introduces latency—time that chips spend idle waiting for the data they need. The wafer approach keeps large pools of memory on the package, bringing the data to the compute and not the other way around.
The practical consequence for enterprise workloads is fewer stalls, higher utilization, and more deterministic latency profiles. For Canadian tech product managers focused on customer-facing AI features—chat interfaces, real-time translation, or medical image inference—this level of predictability may deliver a better user experience and lower costs for meeting strict SLAs.
Cooling at scale: liquid cooling and a 6,000-ton chiller plant
Heat management is the often invisible pivot upon which modern compute performance rotates. One wafer produces significant thermal energy. To manage it, this facility has adopted full liquid cooling for servers, an approach the company began exploring in 2017. The design is intentional: cold water is piped into the machines, the heated return is sent out, and an integrated chiller plant maintains temperature and humidity within tight bounds.
The chiller plant capacity at this site is 6,000 tons with room to expand by another 6,000 tons. The chilled water is circulated at controlled temperatures; the facility intentionally holds the water back from being too cold to avoid dew point issues. Servers must be kept above a temperature where condensation could form. The practical rule-of-thumb discussed by the team was a roughly 15 degree delta between inlet and ambient to avoid moisture accumulation in rack electronics.
For Canadian tech operations teams, there are three takeaways:
- Liquid cooling is now production ready: Large-scale liquid cooling delivers better thermal efficiency, higher power density, and lower long-term operational cost for compute-heavy deployments.
- Design for humidity and condensation risk: In colder climates, or in facilities where chilled water is used, attention to condensation thresholds is essential. Building-level HVAC systems must be specified to maintain safe dew point margins.
- Scalable chiller capacity matters: The ability to expand chiller capacity in modular increments allows a campus to grow without a complete HVAC retrofit—an important risk mitigation for long-lead infrastructure projects.
Operational transparency also plays a role. The site uses sensors and valve telemetry offered by third-party suppliers to continuously monitor water pressure, inlet/outlet temperatures, and other critical parameters. That enables proactive operations and reduces the risk of thermal incidents that could degrade uptime or even damage hardware.
Power strategy: batteries, generators, and near zero downtime
High-performance compute demands steady, high-quality power. The primary provision for power at the facility is electricity derived from natural gas-fired generation. To protect against grid interruptions, the site layers multiple resiliency elements.
First, batteries provide immediate bridging power. They engage the instant the primary feed is interrupted and are sized to provide roughly five minutes of power. That window is sufficient for the large standby generators to spin up. The facility employs multiple three-megawatt generators, each capable of carrying load as needed. Generators run on diesel or liquid natural gas depending on operating strategy and fuel availability.
The three-layered approach—primary grid, batteries for bridging, and spinning generators—keeps downtime to near zero. In practical terms, if the primary feed goes down, the batteries instantaneously support the load and the generators come online within minutes to assume sustained power delivery. This approach is standard for mission-critical data centers but the scale and coordination here are worth noting. The size of the generators in this facility was described as among the largest in the state, and multiple machines provide redundancy so that large loads can be distributed across them.
For Canadian tech buyers evaluating hosting options, the power story matters deeply. Ontario and Quebec offer strong grid resilience and abundant clean energy in many areas, but the structure of backup power, the cost of generators and fuel logistics, and the emissions profile of fallback generation must be evaluated when assessing total operational cost and environmental impact. The facility in question chose natural gas as a primary feed and relies on spinning reserves for resilience. Canadian tech teams should ensure that hosted compute contracts include clear SLAs for power continuity and that the vendor’s redundancy architecture meets enterprise requirements.
Measuring compute: why output matters more than input for latency-sensitive AI
Industry conversations often focus on input metrics: gigawatts and megawatts. Those are important but incomplete. The more relevant metric for latency-sensitive inference workloads is compute output: how many inferences can be performed per second at a given latency threshold, or, put differently, what is the effective inference flops delivered to an application.
The distinction is significant. A facility with large power capacity can underperform in inference if its architecture suffers from memory latency or poor data locality. The site in question made a deliberate choice to emphasize output—44 exaflops of inference—rather than simply reporting building-level energy capacity. The rationale is straightforward: customers care about the end result. For companies in the Canadian tech ecosystem, especially those that must meet strict SLAs, the takeaway is to evaluate vendors based on application-level metrics such as latency percentiles, inference throughput for your model families, and real-world benchmarks rather than raw power or rack count alone.
Modular expansion: how the campus scales and what it means for procurement
The campus approach is modular. The first data hall is online in production, and a second hall was being commissioned, poised to add further capacity—about 20 exaflops in the next hall alone. Racks are pre-cabled and staged; the process is to roll in racks, cable them, then insert the compute units. This modularity allows a predictable ramp of capacity and a procurement cadence that can match demand.
For Canadian tech procurement departments, modular expansion is a pragmatic way to manage capital and operational expenditure. Rather than negotiating a massive upfront purchase for a facility that will take years to fill, organizations can stagger capacity additions tied to revenue milestones or product rollouts. The facility’s ability to scale chillers, power feeds, and physical space in increments ensures that expansion does not require a full redesign at each stage.
Domestic manufacturing and supply chain: why build in the United States matters
The compute modules and final systems were assembled domestically in Milpitas, California. The company chose to manufacture in the United States deliberately, arguing that domestic manufacturing is part of being a responsible citizen in the economy. That choice shortened supply chains for complex modules, allowed closer quality control, and provided greater strategic control over packaging and final testing.
Canadian tech leaders should take notice. While Canada hosts world-class AI research and growing hardware design expertise, the manufacturing base for advanced semiconductor packaging remains concentrated in a few global regions. Supply chain resilience matters for mission-critical infrastructure. Canadian enterprises that prioritize strategic sovereignty may consider partnerships that emphasize onshore or nearshore manufacturing, or at minimum require transparent supply chain commitments from vendors.
Engineering through adversity: the production story and what it teaches operations teams
The team behind this architecture endured a crucible of engineering challenges. There was a period of 15 to 18 months when the problem appeared unsolvable, with monthly expenditure reaching eight million dollars and no success in sight. The record of that period reflects an engineering discipline: root cause analysis on every failure, iterative corrections, and discipline in avoiding repeated mistakes. The breakthrough arrived in a modest lab environment, where the founders watched their hardware finally work after incremental iterations.
There are important lessons here for Canadian tech organizations executing complex infrastructure projects:
- Expect long problem-solving cycles: Radical hardware innovation rarely succeeds on the first attempt. Budgeting, time allocation, and stakeholder communication must reflect the reality of iterative engineering.
- Commit to root cause discipline: Rather than applying band-aid fixes, invest in deep analysis so that recurring errors are eliminated rather than suppressed.
- Retain flexibility: Early-stage projects require flexible procurement and contracting terms. Fixed, inflexible vendor relationships can be catastrophic when discovery reveals new requirements.
Real-world applications: medicine, education, and the acceleration of innovation
The leaders involved in the project are explicit about where this compute advantage will produce social value. Medicine and education top the list. The potential to accelerate drug discovery is especially notable. Historically, bringing a new drug to market takes well over a decade—commonly cited in the range of 17 to 19 years. The compute acceleration offered by integrated wafer-scale inference could cut that timeline substantially. If model-driven design and simulation reduce cycles in lead candidate identification, preclinical modeling, and early-stage safety testing, the time-to-market could be reduced to under ten years for many therapeutic classes.
Education is another area where high-speed, low-latency models can transform learning. AI tutors that adapt in real time to a student’s misconceptions, curriculum gaps, and learning pace require deterministic latency and robust inference. Faster models reduce the friction in interactive instruction and make adaptive learning systems viable at scale.
For Canadian tech companies working in health tech or edtech, these developments change strategic planning. Faster inference broaden the set of feasible product offerings and reduction in time-to-value. Hospitals, research universities, and edtech startups in the Canadian ecosystem should evaluate how access to such compute changes their research timelines and product roadmaps.
What Canadian tech executives should do now
The strategic implications for Canadian tech are immediate. Enterprises, public sector IT leaders, and startups must adjust procurement, architecture, and talent policies to reflect the new landscape. The following action list is designed for decision makers in the Canadian tech ecosystem.
- Reframe vendor evaluations around application performance
Ask suppliers for real-world inference benchmarks on workload families that mirror production models. Move beyond megawatt and rack-count metrics and require latency percentile reporting and throughput measurements for your class of models.
- Model total cost and time-to-value
Include the downstream operational benefits of faster inference in TCO calculators. Faster models can reduce operational costs by shortening model runtimes, enabling cheaper scaling, and improving customer retention through better experiences.
- Plan for liquid cooling
Liquid cooling is no longer experimental. Facilities designed for high throughput compute should include liquid cooling in initial infrastructure plans. For Canadian sites, account for climate-specific challenges such as freeze protocols and condensation control.
- Negotiate supply chain and manufacturing terms
If mission-critical, insist on clarity around manufacturing provenance, lead times, and options for nearshoring. Where possible, pursue partnerships that diversify sources and reduce single points of failure.
- Integrate power resilience into SLAs
Request detailed power architecture disclosures and incorporate generator and battery testing schedules into contractual uptime guarantees. Evaluate the vendor’s carbon strategy for fallback generation.
- Invest in skills for hardware-aware AI operations
Teams that understand both model architecture and hardware characteristics can extract far more value. Train ops and MLE teams on the implications of on-chip memory and wafer scale performance for model compilation, quantization, and runtime tuning.
- Engage with regulators on data residency and security
When offshore or out-of-country compute is considered, Canadian tech teams must factor in regulatory constraints and potential cross-border data movement. Secure and auditable compute pathways should be a prerequisite.
Implications for Canada’s AI ecosystem
Canada has long been a leader in AI research. From the universities of Toronto and Montreal to the growing startup ecosystems in the GTA and Vancouver, Canadian tech organizations have strong research foundations. The new class of data center architectures raises both opportunity and urgency.
Opportunity arises because faster inference reduces the barrier for startups to offer real-time AI features that were previously cost-prohibitive. Urgency comes because access to high-speed inference can become a competitive differentiator. Canadian companies that secure partnerships, invest in hardware-aware engineering, and influence procurement to emphasize inference output will enjoy a significant time-to-market advantage.
There is also public policy relevance. Federal and provincial governments that wish to anchor AI-driven economic growth should prioritize incentives for data center deployment in Canada, funding for domestic manufacturing of advanced packaging, and programs that develop the engineering talent required for hardware-aware AI operations. A coordinated approach that links academic strength to industrial capacity will position Canadian tech firms to capture the downstream economic benefits of faster AI.
Security, resilience, and regulatory considerations
Security is not an afterthought. The facility under review is built to resist local hazards and includes insurance-calibrated construction ratings. For Canadian tech organizations, resilience requirements should be mapped to regulatory obligations—particularly in healthcare, finance, and critical infrastructure sectors where data integrity and continuity are legally mandated.
Key checkpoints for procurement and risk assessments include:
- Physical construction standards and natural hazard resilience
- Redundant power feeds, battery bridging strategies, and generator capacity
- Secure telemetry and monitoring for temperature, power, and network health
- Clear documentation for manufacturing provenance and component sourcing
- Compliance attestations for data handling, audits, and data residency
Canadian tech procurement teams must incorporate these checks into vendor qualification frameworks to avoid surprises and ensure auditable risk posture.
Cost and sustainability tradeoffs
High-performance compute on wafer-scale devices delivers greater inference output per watt compared with some alternative architectures, but power density remains high. One wafer consumes approximately 18 kilowatts. The net efficiency depends on the ratio of useful inference work to the energy consumed, the effectiveness of liquid cooling, and the use of renewable energy on the grid.
For Canadian tech organizations concerned about sustainability, there are several avenues to explore:
- Locate compute in provinces with low-carbon grids such as Quebec or British Columbia when feasible
- Negotiate renewable energy procurement or power purchase agreements for hosted capacity
- Incorporate energy-efficiency metrics into procurement decisions, focusing on inference per watt rather than raw wattage
- Plan for waste heat reclamation where possible—liquid-cooled facilities can make heat reuse more practical
Balancing cost, performance, and emissions will be a central task for Canadian tech leaders as these infrastructure choices become more consequential.
Workforce and skills: who builds and operates these systems
Operating a wafer-scale, liquid-cooled facility is a different discipline than traditional data center operations. Beyond standard systems administration and networking skills, the team requires expertise in thermal engineering, advanced power systems, and hardware-specific model tuning. The facility example includes dedicated operations managers who monitor chiller performance, water pressure, and valve telemetry, and who coordinate generator testing and fuel logistics.
For Canadian tech companies, that suggests a multi-pronged talent strategy:
- Upskill DevOps and ML engineers in hardware-aware model deployment
- Hire or contract specialists in liquid cooling and chiller plant operations
- Invest in facilities engineering capabilities to manage power and mechanical infrastructure
- Partner with local colleges and universities to build a pipeline of technicians with combined IT-mechanical skills
Competitive positioning: how Canadian tech companies can leverage high-speed inference
Being able to access or control wafer-scale inference capacity is a competitive lever. Canadian software vendors can differentiate by offering sub-second, high-quality models for domains that were previously bound by latency. Healthtech firms can provide near-real-time diagnostic tools. Fintech firms can improve fraud detection pipelines. Edtech companies can offer highly responsive adaptive tutoring at scale.
To capitalize on this, Canadian tech companies should:
- Identify use cases where latency materially improves conversion, customer satisfaction, or clinical outcomes
- Quantify the value of reduced inference time in financial terms to justify infrastructure upgrades
- Pilot critical workloads with vendors that provide performance guarantees at the application level
- Structure commercial negotiations to include performance credits for missed latency or throughput SLAs
The long view: what comes next in hardware and Canadian tech strategy
Hardware innovation is iterative and compounding. Wafer-scale integration is one major step, but the market will continue to evolve with improved packaging, better interconnects, and new memory technologies. For Canadian tech, the imperative is to remain adaptive.
Short-term actions are tactical and procurement-focused. Mid-term actions involve talent and partnerships. Long-term actions require public-private collaboration to build resilient onshore supply chains and to ensure that Canada retains an enduring capability to host and operate advanced compute resources.
Conclusion: a wake-up call for Canadian tech
The implications of a data center designed specifically for inference speed cannot be overstated. This facility demonstrates that rethinking processor scale, memory placement, cooling, and power can yield orders-of-magnitude improvements in latency-sensitive AI tasks. For Canadian tech executives, the core message is clear: infrastructure choices will increasingly determine product feasibility, operational cost, and competitive differentiation.
Canadian tech organizations must adapt procurement processes to prioritize application-level performance metrics, invest in skills for hardware-aware ML ops, and engage with government and industry partners to secure resilient supply chains and sustainable energy sources. The future of AI-enabled products in Canada will be won by those who align strategy, engineering, and operations to the new reality of high-speed inference.
Action checklist for Canadian tech leaders
- Require inference-level benchmarks in vendor RFPs that reflect your production workloads
- Compare vendors using inference-per-watt metrics rather than raw power capacity
- Plan for liquid cooling in future data center or co-location builds
- Negotiate SLAs that include detailed power-resiliency and generator testing schedules
- Invest in workforce upskilling for hardware-aware ML deployment and facilities engineering
- Pursue partnerships with domestic manufacturing partners where feasible
- Advocate to policymakers for incentives to host strategic compute capacity in Canada
Frequently asked questions
What makes wafer scale processors different from traditional GPUs?
Wafer scale processors are orders of magnitude larger in physical size than traditional GPU chips, integrating more compute cores and large pools of memory directly on the same package. This eliminates the latency introduced by off-chip memory access and reduces inter-chip communication overhead, resulting in much faster inference performance for certain AI workloads.
How does the wafer size compare to conventional chips?
A typical large GPU chip is roughly 750 square millimeters. The wafer scale processor discussed here measures about 46,250 square millimeters, roughly the size of a dinner plate. By comparison, typical large chips are closer to the size of a postage stamp or thumbnail.
Why is on-chip memory so important for inference?
On-chip memory eliminates the need to fetch data from off-chip DRAM during inference. That reduces latency dramatically and allows compute units to stay busy more consistently. For latency-sensitive applications, the improved data locality can translate into faster responses and lower effective operational cost.
Are liquid cooling systems reliable for Canadian climates?
Yes, liquid cooling systems are reliable when designed with climate-specific controls such as freeze protection and dew point management. Facilities must maintain inlet water temperatures that avoid condensation and should include redundancy in chiller and pump systems. Canadian data center designs often incorporate these protective measures for winter operation.
What are the power resilience strategies used by high-performance data centers?
Typical power resilience strategies include a layered approach: primary utility feed for standard operation, battery systems to provide immediate bridging power, and large standby generators that come online within minutes. This three-tier architecture minimizes downtime during grid interruptions and ensures continuous operation for critical workloads.
How should Canadian tech companies evaluate vendor performance?
Canadian tech companies should prioritize application-level benchmarks such as latency percentiles and inference throughput for their specific model families. Require vendors to provide real-world performance metrics, include performance credits and detailed SLA terms in contracts, and evaluate energy efficiency in terms of inference-per-watt rather than raw power capacity.
What immediate steps should Canadian tech procurement teams take?
Procurement teams should revise RFP templates to require inference-level benchmarks, request detailed power and cooling architectures, ask for manufacturing provenance, and insist on transparent SLAs that encompass uptime, generator testing, and telemetry for thermal and electrical metrics.
How does this infrastructure affect healthcare and drug discovery timelines?
Faster, lower-latency inference can accelerate simulation and design cycles in drug discovery. The leaders involved in the project estimate that such compute could help reduce drug development timelines from historically long ranges of 17 to 19 years to under ten years for many programs by enabling more rapid iteration in silico and earlier identification of viable candidates.
What workforce skills will be most valuable for operating these systems?
Valuable skills include hardware-aware machine learning operations, thermal and mechanical engineering for liquid cooling systems, facilities-level power management, and supply chain expertise for complex hardware procurement. Cross-disciplinary teams that combine ML, DevOps, and facilities engineering will be critical.
How can Canadian policymakers support domestic AI infrastructure?
Policymakers can support domestic AI infrastructure by offering incentives for data center deployment, funding research and pilot programs in advanced packaging and semiconductor manufacturing, supporting workforce development, and creating procurement frameworks that favor resilience and sustainability for strategic compute capacity.
Final thoughts and call to action
High-speed inference capacity is not a theoretical novelty. It is a practical enabler for a new generation of applications across medicine, education, finance, and beyond. Canadian tech leaders must move from passive observers to active planners: demanding application-level benchmarks, investing in the right cooling and power architectures, and building the workforce that can operate these systems at scale.
Canadian tech companies that align strategy with the realities of modern compute will unlock faster time-to-market, lower operational costs for real-world AI services, and a competitive advantage in delivering low-latency, high-quality user experiences. Is the Canadian tech ecosystem ready to make that leap? The opportunity is now.
Is your organization prepared to evaluate vendors on inference output rather than raw power? How will Canadian tech companies change procurement and operations to capture the benefits of wafer-scale, liquid-cooled compute?

