Speaker Details

LLMs, Fast and Everywhere: Acceleration for the Age of GenAI

Carole-Jean Wu

Carole-Jean Wu

Bio

Carole-Jean Wu is a Director of AI Research Science at Meta, where she leads the Systems and Machine Learning Research team. She is a founding member and a Vice President of MLCommons – a non-profit organization that aims to accelerate machine learning for the benefits of all. Dr. Wu also serves on the MLCommons Board as a Director. Prior to Meta/Facebook, She was a professor with tenure at ASU. She earned her M.A. and Ph.D. from Princeton and B.Sc. from Cornell.

Dr. Wu's expertise sits at the intersection of computer architecture and machine learning. Her work spans across datacenter infrastructures and edge systems with a focus on performance, energy efficiency and sustainability. She is passionate about pathfinding and tackling system challenges to enable efficient, scalable, and environmentally-sustainable AI technologies. Her work has been recognized with several awards, including IEEE Micro Top Picks and ACM/IEEE Best Paper Awards. She currently serves on the ACM SIGARCH/SIGMICRO CARES committee.

Abstract

The past 50 years has seen a dramatic increase in the amount of compute per person, in particular, those enabled by AI. On the path to Artificial General Intelligence (AGI), Large Language Models (LLMs) are a critical milestone. To deliver on this promise, systems must provide the necessary capability to fuel the computation requirements behind LLMs and eventually AGI. This comes with significantly-higher computing infrastructure and energy demands than what we are equipped with today. To scale AI sustainably, we must make AI computing efficient. We are exploring exciting new paths that have never been explored before. Innovations across the software and hardware system stack is expected to bring us orders-of-magnitude improvement in efficiency to enable LLM deployment at-scale -- an important step towards delivering on the promise of AI sustainability. I will then showcase the recent release of Torchchat and ExecuTorch that bring the capability to deploy LLM technologies from servers to desktops and mobile platforms at the edge.