Whitepaper:
Technical paper: NVIDIA DGX A100 system architecture
Take an in-depth look at the architecture and design of this universal system for AI infrastructure.
Organizations of all kinds are incorporating AI into their research, development, product, and business processes. This helps them meet and exceed their specific goals, and gain the experience and knowledge to take on even bigger challenges.
However, traditional compute infrastructures are not suitable for AI due to slow CPU architectures and varying system requirements for different workloads and project phases. This drives up complexity, increases cost, and limits scale.
To help organizations overcome these obstacles and succeed in a world that desperately needs the power of AI to solve big challenges, NVIDIA designed the world’s first family of systems purpose-built for AI — the NVIDIA DGX systems.
By leveraging powerful NVIDIA GPUs and designing from the ground up for multiple GPUs and multi-node deployments with DGX POD and DGX SuperPOD reference architectures along with optimized AI software from NVIDIA NGC, these DGX systems deliver unprecedented performance and scalability, and eliminate integration complexity.
Built on the brand-new NVIDIA A100 Tensor Core GPU, NVIDIA DGX A100 is the third generation of DGX systems. Featuring 5 petaFLOPS of AI performance, DGX A100 excels on all AI workloads – including analytics, training and inference – allowing organizations to standardize on a single system that can speed through any type of AI task and dynamically adjust to changing compute needs over time. And with the fastest I/O architecture of any DGX system, NVIDIA DGX A100 is the foundational building block for large AI clusters such as NVIDIA DGX SuperPOD, the enterprise blueprint for scalable AI infrastructure that can scale to hundreds or thousands of nodes to meet the biggest challenges.
This white paper takes an in-depth look at the design and architecture of DGX A100, which can provide you with unmatched flexibility to reduce cost and increase scalability with a universal system for your AI infrastructure.