Excitement over generative AI has prompted a new wave of AI initiatives seen to further the need to leverage Flash storage for AI – and vice versa.
AI systems must be able to process massive volumes of data quickly. The amount of data generated raises enormous storage needs.
On the other hand, AI-based storage systems that can scale intelligently and automatically based on demand is needed to manage huge data sets.
“As we enter a new age of AI, the superior economics, and operational and environmental efficiencies of… all-flash competitive offerings will be more critical than ever,” said Charles Giancarlo, Chairman and CEO, Pure Storage, at the Pure//Accelerate 2023 conference held at Resorts World Las Vegas in June 2023.
AI applications have high storage demands
AI applications have high storage capacity demands that can easily start in the terabyte range and scale into hundreds of petabytes. AI training, in particular, consumes a great amount of digital storage for modelling data as well as memory to support the processing of that data.
As deep learning (DL) training and other computationally intensive workloads become increasingly prevalent in data centers, the need to scale out GPU-accelerated computing infrastructure becomes a significant challenge that IT administrators must address.
AI/DL is a fundamentally different workload compared with traditional enterprise applications running on CPU-based servers. It necessitates consideration of specific networking, storage, and infrastructure management approaches proven to enable improved scalability, performance, and cost-effective manageability.
Anticipating the coming demand for AI several years ago, Pure Storage had introduced FlashBlade and its AIRI (AI-Ready Infrastructure) solution, co-developed with NVIDIA. AIRI//S is a ready-to-deploy NVIDIA DGX BasePOD reference architecture for AI, developed by Pure Storage and NVIDIA and including the latest FlashBlade//S storage.
Patrick Smith, CTO, EMEA, Pure Storage, said: “The key difference AI has brought to the storage equation is based on the fact that massive amounts of data is needed for AI. Besides capacity, we need storage solutions that provide for speedy retrieval, so GPUs are not idle or wasted waiting to process the data to feed the AI applications.”
Organizations can start their AI journey at any scale and grow as needs evolve, and the current excitement over generative AI has prompted a new wave of AI initiatives is seen to further data storage leaders such as Pure Storage as the go-to storage partner for AI projects.
Pure Storage is currently supporting leading-edge AI projects such as autonomous vehicle development companies and Meta AI’s Research Super Cluster (AIRSC), which aims to be the largest AI supercomputer in the world.
Leveraging AI/ML to manage data storage
According to Eric Burgener, Director, Technical Strategy, Pure Storage, who is an ex-IDC research analyst, one of the key storage-related trends today is leveraging AI/ML to manage data storage.
Without the need for human interaction, AI-enabled storage provides continuous real-time updates from a variety of enterprise data sources, optimizes the data, and performs other intelligent automated tasks on it.
Businesses are starting to use AI to automate otherwise manual tasks like data capture, deduplication, anomaly detection and data validation. They are also training models to apply regulatory policies and ethical standards automatically, ensuring those principles are embedded from the beginning.
Pure Storage’s FlashBlade hardware portfolio is GPU Direct Storage (GDS)-ready, with software enhancements delivering complete GDS support to be available in the near term, further strengthening Pure Storage’s partnership with NVIDIA and enhancing DGX BasePOD certified solutions.
Meta’s story
Meta needed storage that could handle petabytes of data while delivering the performance demanded by its new supercomputer. The storage also needed to have a very low power profile so that Meta could invest more power into the GPUs which speed up AI models.
Meta’s ambitious business transformation strategy includes data storage that supports its next-generation AI research trained on petabytes of data, unifying data delivery to scale across even the most data-intensive jobs, and helping the organization learn how to deliver more content that people want to see.
“We contacted a number of storage vendors of both disk and flash to evaluate their highest performance, highest density offerings,” said Vivek Pai, AIRSC Storage Lead, Meta. “From a combination of performance and power and cost, we ended up selecting Pure Storage.”
As a result of working with Pure Storage, Meta designed and built a first-of-its-kind AI research cluster in less than two years, leveraging reliable flash performance that requires less maintenance than disk storage, and reducing overall operational costs.
Additionally, the low power consumption allows Meta to direct more power to GPUs, improving overall performance.