Home Learn Deploying high-performance, energy-efficient AI

Deploying high-performance, energy-efficient AI

0
Deploying high-performance, energy-efficient AI

In partnership withIntel

Although AI is under no circumstances a brand new technology there have been massive and rapid investments in it and huge language models. Nonetheless, the high-performance computing that powers these rapidly growing AI tools — and enables record automation and operational efficiency — also consumes a staggering amount of energy. With the proliferation of AI comes the responsibility to deploy that AI responsibly and with a watch to sustainability during hardware and software R&D in addition to inside data centers.

“Enterprises should be very aware of the energy consumption of their digital technologies, how big it’s, and the way their decisions are affecting it,” says corporate vp and general manager of knowledge center platform engineering and architecture at Intel, Zane Ball.

Considered one of the important thing drivers of a more sustainable AI is modularity, says Ball. Modularity breaks down subsystems of a server into standard constructing blocks, defining interfaces between those blocks in order that they can work together. This method can reduce the quantity of embodied carbon in a server’s hardware components and allows for components of the general ecosystem to be reused, subsequently reducing R&D investments.

Downsizing infrastructure inside data centers, hardware, and software also can help enterprises reach greater energy efficiency without compromising function or performance. While very large AI models require megawatts of super compute power, smaller, fine-tuned models that operate inside a selected knowledge domain can maintain high performance but low energy consumption.

“You surrender that sort of wonderful general purpose use like if you’re using ChatGPT-4 and you’ll be able to ask it every thing from seventeenth century Italian poetry to quantum mechanics, should you narrow your range, these smaller models can offer you equivalent or higher form of capability, but at a tiny fraction of the energy consumption,” says Ball.

The opportunities for greater energy efficiency inside AI deployment will only expand over the subsequent three to 5 years. Ball forecasts significant hardware optimization strides, the rise of AI factories — facilities that train AI models on a big scale while modulating energy consumption based on its availability — in addition to the continued growth of liquid cooling, driven by the necessity to cool the subsequent generation of powerful AI innovations.

“I believe making those solutions available to our customers is beginning to open people’s eyes how energy efficient you’ll be able to be while probably not giving up an entire lot when it comes to the AI use case that you simply’re on the lookout for.”

Full Transcript

From MIT Technology Review, I’m Laurel Ruma and that is Business Lab, the show that helps business leaders make sense of recent technologies coming out of the lab and into the marketplace.

Our topic is constructing a greater AI architecture. Going green is not for the faint of heart, nevertheless it’s also a pressing need for a lot of, if not all enterprises. AI provides many opportunities for enterprises to make higher decisions, so how can it also help them be greener?

Two words for you: sustainable AI.

My guest is Zane Ball, corporate vp and general manager of knowledge center platform engineering and architecture at Intel.

This podcast is produced in partnership with Intel.

Welcome Zane.

Good morning.

So to set the stage for our conversation, let’s start off with the massive topic. As AI transforms businesses across industries, it brings the advantages of automation and operational efficiency, but that high-performance computing also consumes more energy. Could you give an outline of the present state of AI infrastructure and sustainability at the massive enterprise level?

Absolutely. I believe it helps to simply form of really zoom out big picture, and should you have a look at the history of IT services possibly within the last 15 years or so, obviously computing has been expanding at a really fast pace. And the excellent news about that history of the last 15 years or so, is while computing has been expanding fast, we have been in a position to contain the expansion in energy consumption overall. There was a terrific study a few years ago in that talked about how compute had grown by possibly 550% over a decade, but that we had just increased electricity consumption by a couple of percent. So those form of efficiency gains were really profound. So I believe the method to form of give it some thought is computing’s been expanding rapidly, and that in fact creates every kind of advantages in society, a lot of which reduce carbon emissions elsewhere.

But we have been in a position to try this without growing electricity consumption all that much. And that is form of been possible due to things like Moore’s Law, Big Silicon has been improving with every couple of years and make devices smaller, they devour less power, things get more efficient. That is a part of the story. One other big a part of this story is the appearance of those hyperscale data centers. So really, really large-scale computing facilities, finding every kind of economies of scale and efficiencies, high utilization of hardware, not a number of idle hardware sitting around. That also was a really meaningful energy efficiency. After which finally this development of virtualization, which allowed much more efficient utilization of hardware. So those three things together allowed us to form of accomplish something really remarkable. And through that point, we also had AI beginning to play, I believe since about 2015, AI workloads began to play a fairly significant role in digital services of every kind.

But then just a couple of 12 months ago, ChatGPT happens and we have now a non-linear shift within the environment and suddenly large language models, probably not news to anyone on this listening to this podcast, has pivoted to the middle and there is only a breakneck investment across the industry to construct very, very fast. And what can be driving that’s that not only is everyone rushing to reap the benefits of this amazing large language model form of technology, but that technology itself is evolving in a short time. And in reality also quite well-known, these models are growing in size at a rate of about 10x per 12 months. So the quantity of compute required is actually form of staggering. And if you consider all of the digital services on the earth now being infused with AI use cases with very large models, after which those models themselves growing 10x per 12 months, we’re taking a look at something that is not very much like that last decade where our efficiency gains and our greater consumption were almost penciling out.

Now we’re taking a look at something I believe that is not going to pencil out. And we’re really facing a extremely significant growth in energy consumption in these digital services. And I believe that is concerning. And I believe meaning that we have to take some strong actions across the industry to get on top of this. And I believe just the very availability of electricity at this scale goes to be a key driver. But in fact many firms have net-zero goals. And I believe as we pivot into a few of these AI use cases, we have got work to do to square all of that together.

Yeah, as you mentioned, the challenges are attempting to develop sustainable AI and making data centers more energy efficient. So could you describe what modularity is and the way a modularity ecosystem can power a more sustainable AI?

Yes, I believe during the last three or 4 years, there’ve been a lot of initiatives. Intel’s played a giant a part of this as well of re-imagining how servers are engineered into modular components. And really modularity for servers is just exactly because it sounds. We break different subsystems of the server down into some standard constructing blocks, define some interfaces between those standard constructing blocks in order that they’ll work together. And that has a lot of benefits. Primary, from a sustainability viewpoint, it lowers the embodied carbon of those hardware components. A few of these hardware components are quite complex and really energy intensive to fabricate. So imagine a 30 layer circuit board, for instance, is a fairly carbon intensive piece of hardware. I don’t need the complete system, if only a small a part of it needs that form of complexity. I can just pay the worth of the complexity where I would like it.

And by being intelligent about how we break up the design in numerous pieces, we bring that embodied carbon footprint down. The reuse of pieces also becomes possible. So once we upgrade a system, possibly to a brand new telemetry approach or a brand new security technology, there’s only a small circuit board that has to get replaced versus replacing the entire system. Or possibly a brand new microprocessor comes out and the processor module might be replaced without investing in recent power supplies, recent chassis, recent every thing. And in order that circularity and reuse becomes a big opportunity. And in order that embodied carbon aspect, which is about 10% of carbon footprint in these data centers might be significantly improved. And one other good thing about the modularity, apart from the sustainability, is it just brings R&D investment down. So if I will develop 100 different sorts of servers, if I can construct those servers based on the exact same constructing blocks just configured in another way, I will have to speculate less money, less time. And that could be a real driver of the move towards modularity as well.

So what are a few of those techniques and technologies like liquid cooling and ultrahigh dense compute that enormous enterprises can use to compute more efficiently? And what are their effects on water consumption, energy use, and overall performance as you were outlining earlier as well?

Yeah, those are two I believe very essential opportunities. And let’s just take them one at a  time. Emerging AI world, I believe liquid cooling might be probably the most essential low hanging fruit opportunities. So in an air cooled data center, an amazing amount of energy goes into fans and chillers and evaporative cooling systems. And that is definitely a big part. So should you move a knowledge center to a completely liquid cooled solution, that is a possibility of around 30% of energy consumption, which is form of a wow number. I believe persons are often surprised just how much energy is burned. And should you walk into a knowledge center, you almost need ear protection since it’s so loud and the warmer the components get, the upper the fan speeds get, and the more energy is being burned within the cooling side and liquid cooling takes a number of that off the table.

What offsets that’s liquid cooling is a bit complex. Not everyone seems to be fully in a position to put it to use. There’s more upfront costs, but actually it saves money in the long term. So the full cost of ownership with liquid cooling could be very favorable, and as we’re engineering recent data centers from the bottom up. Liquid cooling is a extremely exciting opportunity and I believe the faster we will move to liquid cooling, the more energy that we will save. But it surely’s an advanced world on the market. There’s a number of different situations, a number of different infrastructures to design around. So we shouldn’t trivialize how hard that’s for a person enterprise. Considered one of the opposite advantages of liquid cooling is we get out of the business of evaporating water for cooling. Lots of North America data centers are in arid regions and use large quantities of water for evaporative cooling.

That is nice from an energy consumption viewpoint, however the water consumption might be really unusual. I’ve seen numbers getting near a trillion gallons of water per 12 months in North America data centers alone. After which in humid climates like in Southeast Asia or eastern China for instance, that evaporative cooling capability is just not as effective and so far more energy is burned. And so should you actually need to get to actually aggressive energy efficiency numbers, you simply cannot do it with evaporative cooling in those humid climates. And so those geographies are form of the tip of the spear for moving into liquid cooling.

The opposite opportunity you mentioned was density and bringing higher and better density of computing has been the trend for a long time. That’s effectively what Moore’s Law has been pushing us forward. And I believe it’s just essential to appreciate that is not done yet. As much as we take into consideration racks of GPUs and accelerators, we will still significantly improve energy consumption with higher and better density traditional servers that enables us to pack what might’ve been an entire row of racks right into a single rack of computing in the long run. And people are substantial savings. And at Intel, we have announced we have now an upcoming processor that has 288 CPU cores and 288 cores in a single package enables us to construct racks with as many as 11,000 CPU cores. So the energy savings there may be substantial, not simply because those chips are very, very efficient, but because the quantity of networking equipment and ancillary things around those systems is quite a bit less since you’re using those resources more efficiently with those very high dense components. So continuing, if even perhaps accelerating our path to this ultra-high dense form of computing goes to assist us get to the energy savings we’d like possibly to accommodate a few of those larger models which can be coming.

Yeah, that definitely is smart. And that is segue into this other a part of it, which is how data centers and hardware as well software can collaborate to create greater energy efficient technology without compromising function. So how can enterprises put money into more energy efficient hardware corresponding to hardware-aware software, and as you were mentioning earlier, large language models or LLMs with smaller downsized infrastructure but still reap the advantages of AI?

I believe there are a number of opportunities, and possibly essentially the most exciting one which I see immediately is that at the same time as we’re pretty wowed and blown away by what these really large models are in a position to do, regardless that they require tens of megawatts of super compute power to do, you’ll be able to actually get a number of those advantages with far smaller models so long as you are content to operate them inside some specific knowledge domain. So we have often referred to those as expert models. So take for instance an open source model just like the Llama 2 that Meta produced. So there’s like a 7 billion parameter version of that model. There’s also, I believe, a 13 and 70 billion parameter versions of that model in comparison with a GPT-4, possibly something like a trillion element model. So it’s miles, far, far smaller, but if you positive tune that model with data to a selected use case, so should you’re an enterprise, you are probably working on something fairly narrow and specific that you simply’re attempting to do.

Perhaps it is a customer support application or it is a financial services application, and also you as an enterprise have a number of data out of your operations, that is data that you simply own and you’ve gotten the appropriate to make use of to coach the model. And so regardless that that is a much smaller model, if you train it on that domain specific data, the domain specific results might be quite good in some cases even higher than the massive model. So that you surrender that sort of wonderful general purpose use like if you’re using ChatGPT-4 and you’ll be able to ask it every thing from seventeenth century Italian poetry to quantum mechanics, should you narrow your range, these smaller models can offer you equivalent or higher form of capability, but at a tiny fraction of the energy consumption.

And we have demonstrated a couple of times, even with just an ordinary Intel Xeon two socket server with a number of the AI acceleration technologies we have now in those systems, you’ll be able to actually deliver quite experience. And that is without even any GPUs involved within the system. In order that’s just good old-fashioned servers and I believe that is pretty exciting.

That also means the technology’s quite accessible, right? So it’s possible you’ll be an enterprise, you’ve gotten a general purpose infrastructure that you simply use for a number of things, you should utilize that for AI use cases as well. And should you’ve taken advantage of those smaller models that fit inside infrastructure we have already got or infrastructure you can easily obtain. And so those smaller models are pretty exciting opportunities. And I believe that is probably certainly one of the primary things the industry will adopt to get energy consumption under control is excellent sizing the model to the activity to the use case that we’re targeting. I believe there’s also… you mentioned the concept of hardware-aware software. I believe that the collaboration between hardware and software has all the time been a possibility for significant efficiency gains.

I discussed early on on this conversation how virtualization was certainly one of the pillars that gave us that form of implausible result during the last 15 years. And that was very much exactly that. That is bringing some deep collaboration between the operating system and the hardware to do something remarkable. And a number of the acceleration that exists in AI today actually is the same form of considering, but that is probably not the top of the hardware software collaboration. We will deliver quite stunning ends in encryption and in memory utilization in a number of areas. And I believe that that is got to be an area where the industry is prepared to speculate. It is extremely easy to have plug and play hardware where everyone programs at an excellent high level language, no person thinks concerning the impact of their software application downstream. I believe that is going to should change. We will have to actually understand how our application designs are impacting energy consumption going forward. And it’s not purely a hardware problem. It’s got to be hardware and software working together.

And you’ve got outlined so a lot of these different form of technologies. So how can enterprise adoption of things like modularity and liquid cooling and hardware aware software be incentivized to truly make use of all these recent technologies?

A 12 months ago, I anxious quite a bit about that query. How will we get people who find themselves developing recent applications to simply pay attention to the downstream implications? Considered one of the advantages of this revolution within the last 12 months is I believe just availability of electricity goes to be a giant challenge for a lot of enterprises as they seek to adopt a few of these energy intensive applications. And I believe the hard reality of energy availability goes to bring some very strong incentives in a short time to attack these sorts of problems.

But I do think beyond that like a number of areas in sustainability, accounting is actually essential. There’s a number of good intentions. There’s a number of firms with net-zero goals that they are serious about. They’re willing to take strong actions against those goals. But should you cannot accurately measure what your impact is either as an enterprise or as a software developer, I believe you’ve gotten to form of find where the purpose of motion is, where does the rubber meet the road where a micro decision is being made. And if the carbon impact of that is known at that time, then I believe you’ll be able to see people take the actions to reap the benefits of the tools and capabilities which can be there to get a greater result. And so I do know there’s a lot of initiatives within the industry to create that form of accounting, and particularly for software development, I believe that is going to be really essential.

Well, it is also clear there’s an imperative for enterprises which can be attempting to reap the benefits of AI to curb that energy consumption in addition to meet their environmental, social, and governance or ESG goals. So what are the most important challenges that include making more sustainable AI and computing transformations?

It’s a fancy topic, and I believe we have already touched on a few them. Just as I used to be just mentioning, definitely getting software developers to grasp their impact throughout the enterprise. And if I’m an enterprise that is procuring my applications and software, possibly cloud services, I would like to be sure that that accounting is an element of my procurement process, that in some cases that is gotten easier. In some cases, there’s still work to do. If I’m operating my very own infrastructure, I actually have to take a look at liquid cooling, for instance, an adoption of a few of these more modern technologies that permit us get to significant gains in energy efficiency. And naturally, really taking a look at the use cases and finding essentially the most energy efficient architecture for that use case. For instance, like using those smaller models that I used to be talking about. Enterprises should be very aware of the energy consumption of their digital technologies, how big it’s and the way their decisions are affecting it.

So could you offer an example or use case of certainly one of those energy efficient AI driven architectures and the way AI was subsequently deployed for it?

Yes. I believe that a few of one of the best examples I’ve seen within the last 12 months were really around these smaller models where Intel did an example that we published around financial services, and we found that something like three hours of fine-tuning training on financial services data allowed us to create a chatbot solution that performed in an impressive manner on an ordinary Xeon processor. And I believe making those solutions available to our customers is beginning to open people’s eyes how energy efficient you’ll be able to be while probably not giving up an entire lot when it comes to the AI use case that you simply’re on the lookout for. And so I believe we’d like to simply proceed to get those examples on the market. We’ve a lot of collaborations corresponding to with Hugging Face with open source models, enabling those solutions on our products like our Gaudi2 accelerator has also performed thoroughly from a performance per watt viewpoint, the Xeon processor itself. So those are great opportunities.

After which how do you envision the long run of AI and sustainability in the subsequent three to 5 years? There looks as if a lot opportunity here.

I believe there’s going to be a lot change in the subsequent three to 5 years. I hope nobody holds me to what I’m about to say, but I believe there are some pretty interesting trends on the market. One thing, I believe, to take into consideration is the trend of AI factories. So training a model is somewhat little bit of an interesting activity that is distinct from what we normally consider as real time digital services. You might have real time digital service like Vinnie, the app in your iPhone that is connected somewhere within the cloud, and that is an actual time experience. And it’s all about 99.999% uptime, short latencies to deliver that user experience that folks expect. But AI training is different. It’s somewhat bit more like a factory. We produce models as a product after which the models are used to create the digital services. And that I believe becomes a vital distinction.

So I can actually construct some giant gigawatt facility somewhere that does nothing but train models on a big scale. I can partner with the infrastructure of the electricity providers and utilities very similar to an aluminum plant or something would do today where I actually modulate my energy consumption with its availability. Or possibly I reap the benefits of solar or wind power’s ability, I can modulate once I’m consuming power, not consuming power. And so I believe if we will see some really large scale sorts of efforts like that, and people AI factories could possibly be very, very efficient, they might be liquid cooled and so they might be closely coupled to the utility infrastructure. I believe that is a fairly exciting opportunity. And while that is form of an acknowledgement that there is going to be gigawatts and gigawatts of AI training occurring. Second opportunity, I believe on this three to 5 years, I do think liquid cooling will change into way more pervasive.

I believe that will likely be driven by the necessity to cool the subsequent generation of accelerators and GPUs will make it a requirement, but then that can have the option to construct that technology out and scale it more ubiquitously for every kind of infrastructure. And that can allow us to shave huge amounts of gigawatts out of the infrastructure, save tons of of billions of gallons of water annually. I believe that is incredibly exciting. And if I just… the innovation on the model size as well, a lot has modified with just the last five years with large language models like ChatGPT, let’s not assume there’s not going to be even larger change in the subsequent three to 5 years. What are the brand new problems which can be going to be solved, recent innovations? So I believe as the prices and impact of AI are being felt more substantively, there will be a number of innovation on the model side and folks will give you recent ways of cracking a few of these problems and there will be recent exciting use cases that come about.

Finally, I believe on the hardware side, there will likely be recent AI architectures. From an acceleration viewpoint today, a number of AI performance is restricted by memory bandwidth, memory bandwidth and networking bandwidth between the assorted accelerator components. And I do not think we’re anywhere near having an optimized AI training system or AI inferencing systems. I believe the discipline is moving faster than the hardware and there is a number of opportunity for optimization. So I believe we’ll see significant differences in networking, significant differences in memory solutions over the subsequent three to 5 years, and definitely over the subsequent 10 years that I believe can open up a considerable set of improvements.

And naturally, Moore’s Law itself continues to advance advanced packaging technologies, recent transistor types that allow us to construct ever more ambitious pieces of silicon, which could have substantially higher energy efficiency. So all of those things I believe will likely be essential. Whether we will sustain with our energy efficiency gains with the explosion in AI functionality, I believe that is the actual query and it’s just going to be an excellent interesting time. I believe it will be a really revolutionary time within the computing industry over the subsequent few years.

And we’ll should see. Zane, thanks a lot for joining us on the Business Lab.

Thanks.

That was Zane Ball, corporate vp and general manager of knowledge center platform engineering and architecture at Intel, who I spoke with from Cambridge, Massachusetts, the house of MIT and MIT Technology Review.

That is it for this episode of Business Lab. I’m your host, Laurel Ruma. I’m the director of Insights, the custom publishing division of MIT Technology Review. We were founded in 1899 on the Massachusetts Institute of Technology, and you may as well find us in print, on the internet, and at events annually around the globe. For more details about us and the show, please take a look at our website at technologyreview.com.

This show is offered wherever you get your podcasts. For those who enjoyed this episode, we hope you may take a moment to rate and review us. Business Lab is a production of MIT Technology Review. This episode was produced by Giro Studios. Thanks for listening.

LEAVE A REPLY

Please enter your comment!
Please enter your name here