The fifth annual AI Hardware Summit was back this month, and for the first time in a couple of years, it took place fully in-person in Santa Clara, California. The world’s leading experts in AI hardware came together over the course of three days to discuss some of the big challenges facing the industry, and amongst them was Rambus Fellow, Steven Woo.
We caught up with Steven to find out all about the event and learn more about the panel discussion he led on one of the primary challenges for AI hardware and systems, the AI memory bottleneck.
Question: How would you describe the AI Hardware Summit 2022 to someone who was not there?
Steven: AI Hardware Summit is focused on AI and Machine Learning at the systems level, and brings together chip, system architecture, and software experts to discuss the biggest challenges in AI hardware. The conference includes talks, panels, workshops, exhibits, and networking sessions that give speakers and participants a chance to interact and share their thoughts on challenges and solutions for developing better AI hardware and systems in the future. I led a panel this year that discussed one of the primary challenges for AI hardware and systems, the AI memory bottleneck.
Question: What were some of your key takeaways from the AI Hardware Summit this year?
Steven: AI hardware and software have really grown in popularity over the last 10 years, and over that time we’ve seen a growing range of use cases across the industry. One of the bigger challenges is how to develop hardware and software that addresses this wide range of workloads. Hardware is expensive and time-consuming to develop, and while targeting hardware to a specific workload will give the best results, it’s just not practical to have many individual hardware designs. Reducing design costs and improving hardware flexibility (that allows many workloads to be addressed by the same hardware) is growing in importance, and software is playing an ever-increasing role in helping to make hardware and systems more flexible. Another important challenge is providing better memory and memory systems for AI hardware. Memory and memory systems are a bottleneck in AI hardware, often limiting the speed at which models can be trained and processed. AI models are growing at a rate that’s faster than traditional technologies can keep up – the largest models now have trillions of parameters, requiring larger memory capacities to store them, and higher memory bandwidths to move models, intermediate results, and training data between memory and AI processors.
Question: What AI developments are you personally most excited about seeing in the years to come?
Steven: AI hardware is challenging to use by itself, but software has really helped to democratize access to AI processing. The industry has done a good job with tools, libraries, and infrastructure that abstracts away some of the unique details of each hardware implementation, allowing users to focus on algorithms that are automatically translated to efficient code on the hardware. There’s more work to do in this area, but as the industry matures broader access will become available, opening up AI to an even larger base of users in the future. And while many of the techniques behind current AI hardware have been around for decades, at the time they weren’t practical to implement. Technology advances like better silicon manufacturing and higher performance memory have made them practical now, and this has led to tremendous advances in new algorithms and domain-specific architectures. Transformers are a great example for natural language processing, it’s something that wasn’t possible years ago, and has only come to fruition because of advanced hardware and larger training sets that enable better algorithm development.
Question: What are the key things that were discussed in your panel to get around the AI memory bottleneck?
Steven: I was joined on the panel by Sumti Jairath (Chief Architect at SambaNova Systems), Matt Fyles (SVP Software at Graphcore), and Euicheol Lim (Fellow at SK hynix), and we talked about the importance of memory from several different angles. Memory and memory systems are a key bottleneck in AI hardware and systems today and will continue to be a bottleneck in the future. Flexible AI hardware and systems need a flexible memory solution that can enable different memory capacity and bandwidths so that resources can be dynamically tailored to meet the needs of workloads being processed. CXL offers a great solution for enabling flexibility that allows memory bandwidth and capacity to be scaled as needed by the infrastructure and AI workloads and offers further benefits by enabling memory disaggregation. In terms of the memory components themselves, roughly 2/3 of the power to access memory and move data back and forth to an AI processor is spent simply moving the data, with the rest of the power being used to access data in the DRAM core. Because minimizing data movement has important benefits for system power and performance, Processing-in-Memory (PIM) is seeing increasing interest in the industry, not only for AI but in other areas as well. PIM offloads some of the most important and most common processing functions directly into the memory device, minimizing data movement and reducing power while increasing performance. Power will remain an important challenge going forward, and any power that can be saved in external components like memory can in turn be used to make processing better. Turning power savings into both a hardware problem and a software problem will help improve power-efficiency in the future. Techniques like reduced precision, sparsity, and compression – all of which trade off accuracy for performance and power-efficiency – have been in use for long enough now that software developers understand these tradeoffs and can make appropriate choices to improve power consumption. Although the current era of AI has been going on for about a decade, in many ways we’re still in the early days of this next phase, and we look forward to future developments in this field.
Leave a Reply