Watch: Meta’s engineers on building network infrastructure for AI

By neub9
2 Min Read

Meta is dedicated to building the future of AI across all levels, from hardware like MTIA v1, Meta’s first-generation AI inference accelerator, to publicly released models like Llama 2, Meta’s next-generation large language model, and new generative AI (GenAI) tools like Code Llama.

To deliver next-generation AI products and services at Meta’s scale, a next-generation infrastructure is required.

The 2023 edition of Networking at Scale focused on Meta’s engineers and researchers designing and operating the network infrastructure for Meta’s AI workloads, including ranking and recommendation workloads, and GenAI models. The conference covered a wide range of topics such as physical and logical network design, custom routing and load balancing solutions, performance tuning, debugging, benchmarking, and workload simulation and planning. The event also looked ahead to the requirements of GenAI models arriving in the coming years.

Networking for GenAI Training and Inference Clusters

Jongsoo Park, Research Scientist, Infrastructure
Petr Lapukhov, Network Engineer

Developing new GenAI technologies and integrating them into product features is a top priority at Meta. However, the scale and complexity of GenAI models present new challenges for Meta’s network infrastructure.

Jongsoo Park and Petr Lapukhov discuss the unique requirements of new large language models and how Meta’s infrastructure is evolving for the new GenAI landscape.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *