Dec 17 : Alphabet’s Google is engaged on a brand new initiative to make its synthetic intelligence chips higher at working PyTorch, the world’s most generally used AI software program framework, in a transfer aimed toward weakening Nvidia’s longstanding dominance of the AI computing market, in response to folks acquainted with the matter.
The effort is a part of Google’s aggressive plan to make its Tensor Processing Units a viable various to Nvidia’s market-leading GPUs. TPU gross sales have change into an important progress engine of Google’s cloud income as it seeks to show to traders that its AI investments are producing returns.
But {hardware} alone will not be sufficient to spur adoption. The new initiative, identified internally as “TorchTPU,” goals to take away a key barrier that has slowed adoption of TPU chips by making them absolutely suitable and developer-friendly for patrons who’ve already constructed their tech infrastructure utilizing PyTorch software program, the sources mentioned. Google can also be contemplating open-sourcing elements of the software program to hurry uptake amongst prospects, a few of the folks mentioned.
Compared with earlier makes an attempt to assist PyTorch on TPUs, Google has devoted extra organizational focus, assets and strategic significance to TorchTPU, as demand grows from corporations that wish to undertake the chips however view the software program stack as a bottleneck, the sources mentioned.
PyTorch, an open-source venture closely supported by Meta Platforms, is likely one of the most generally used instruments for builders who make AI fashions. In Silicon Valley, only a few builders write each line of code that chips from Nvidia, Advanced Micro Devices or Google will really execute.
Instead, these builders depend on instruments like PyTorch, which is a group of pre-written code libraries and frameworks that automate many widespread duties in growing AI software program. Originally launched in 2016, PyTorch’s historical past has been carefully tied to Nvidia’s improvement of CUDA, the software program that some Wall Street analysts regard as the corporate’s strongest protect in opposition to opponents.
Nvidia’s engineers have spent years making certain that software program developed with PyTorch runs as quick and effectively as potential on its chips. Google, against this, has lengthy had its inner armies of software program builders use a distinct code framework referred to as Jax, and its TPU chips use a instrument referred to as XLA to make that code run effectively. Much of Google’s personal AI software program stack and efficiency optimization has been constructed round Jax, widening the hole between how Google makes use of its chips and the way prospects wish to use them.
A Google Cloud spokesperson didn’t touch upon the specifics of the venture, however confirmed to Reuters that the transfer would offer prospects with alternative.
“We are seeing massive, accelerating demand for both our TPU and GPU infrastructure,” the spokesperson mentioned. “Our focus is providing the flexibility and scale developers need, regardless of the hardware they choose to build on.”
TPU FOR CUSTOMERS
Alphabet had lengthy reserved the lion’s share of its personal chips, or TPUs, for in-house use solely. That modified in 2022, when Google’s cloud computing unit efficiently lobbied to supervise the group that sells TPUs. The transfer drastically elevated Google Cloud’s allocation of TPUs and as prospects’ curiosity in AI has grown, Google has sought to capitalize by ramping up manufacturing and gross sales of TPUs to exterior prospects.
But the mismatch between the PyTorch frameworks utilized by a lot of the world’s AI builders and the Jax frameworks that Google’s chips are at present most finely tuned to run implies that most builders can’t simply undertake Google’s chips and get them to carry out in addition to Nvidia’s with out enterprise vital, further engineering work. Such work takes money and time within the fast-paced AI race.
If profitable, Google’s “TorchTPU” initiative may considerably scale back switching prices for corporations that need options to Nvidia’s GPUs. Nvidia’s dominance has been strengthened not solely by its {hardware} however by its CUDA software program ecosystem, which is deeply embedded in PyTorch and has change into the default methodology by which corporations practice and run massive AI fashions.
Enterprise prospects have been telling Google that TPUs are more durable to undertake for AI workloads as a result of they traditionally required builders to change to Jax, a machine-learning framework favored internally at Google, slightly than PyTorch, which most AI builders already use, the sources mentioned.
JOINT EFFORTS WITH META
To pace improvement, Google is working carefully with Meta, the creator and steward of PyTorch, in response to the sources. The two tech giants have been discussing offers for Meta to entry extra TPUs, a transfer first reported by The Information.
Early choices for Meta have been structured as Google-managed providers, by which prospects like Meta put in Google’s chips designed to run Google software program and fashions, with Google offering operational assist. Meta has a strategic curiosity in engaged on software program that makes it simpler to run TPUs, in a bid to decrease inference prices and diversify its AI infrastructure away from Nvidia’s GPUs to achieve negotiating energy, the folks mentioned.
Meta declined to remark.
This 12 months, Google has begun promoting TPUs straight into prospects’ information facilities slightly than limiting entry to its personal cloud. Amin Vahdat, a Google veteran, was named head of AI infrastructure this month, reporting on to CEO Sundar Pichai.
Google wants that infrastructure each to run its personal AI merchandise, together with the Gemini chatbot and AI-powered search, and to provide prospects of Google Cloud, which sells entry to TPUs to corporations corresponding to Anthropic.

