Home BrandpostsDelivering AI Infrastructure at ScaleWhy Purpose-Built Infrastructure is the Best Option for Scaling AI Model Development

Why Purpose-Built Infrastructure is the Best Option for Scaling AI Model Development

BrandPost By Keith Shaw

Aug 04, 2022

Artificial IntelligenceIT Leadership

Do-it-yourself AI infrastructure runs the risk of wasted time, money and data-scientist talent.

Credit: iStock

Many companies that begin their AI projects in the cloud often reach a point when cost and time variables become issues. That’s typically due to the exponential growth in dataset size and complexity of AI models.

“In an early phase, you might submit a job to the cloud where a training run would execute and the AI model would converge quickly,” says Tony Paikeday, senior director of AI systems at NVIDIA. “But as models and datasets grow, there’s a stifling effect associated with the escalating compute cost and time. Developers find that a training job now takes many hours or even days, and in the case of some language models, it could take many weeks. What used to be fast, iterative model prototyping, grinds to a halt and creative exploration starts to get stifled.”

This inflection point related to the increasing amount of time needed for AI model training — as well as increasing costs around data gravity and compute cycles — spurs many companies to adopt a hybridized approach and move their AI projects from the cloud back to an on-premises infrastructure or one that’s colocated with their data lake.

But there’s an additional trap that many companies might encounter. Paikeday says it occurs if they choose to build such infrastructure themselves or repurpose existing IT infrastructure instead of going to a purpose-built architecture designed specifically for AI.

“The IT team might say, ‘We have lots of servers, let’s just configure them with GPUs and throw these jobs at them’,” he says. “But then they realize it’s not the same as a system that is designed specifically to train AI models at scale, across a cluster that’s optimized to deliver results in minutes instead of weeks.”

With AI development, companies need fast ROI, by ensuring data scientists are working on the right things. “You’re paying a lot of money for data-science talent,” Paikeday says. “The more time they spend not doing data science — like waiting on a training run, troubleshooting software, or talking to network, storage or server vendors to solve an issue — that’s lost money and a lot of sweat equity that has nothing to do with creating models that deliver business value.”

That’s a significant benefit of a purpose-built appliance for AI models that can be installed on premises or in a colocation facility. For example, NVIDIA’s DGX A100 is meant to be unpacked, plugged in and powered-up enabling data scientists to be productive within hours, instead of weeks. The DGX system offers companies five key benefits to scale AI development:

A hardware design that is optimized for AI, along with parallelism throughout the architecture to efficiently distribute computational work across all the GPUs and DGX systems connected together. It’s not just a system; it’s an infrastructure that scales to any size problem.
A field-proven, fully integrated AI software stack including drivers, libraries and AI frameworks that are optimized to work seamlessly together.
A turnkey, integrated data center solution that companies can buy from their favorite value-added reseller that brings together compute, storage, networking, software and consultants to get things up and running quickly.
The DGX system is a platform, not just a box, from a company that specializes in AI, and has already created state-of-the-art models, including natural language processing, recommender systems, autonomous systems, and more — all of which are continually being improved by the NVIDIA team and made available to every DGX customer.
“DGXperts” bring AI-fluency and know-how, giving guidance on the best way to build a model, solve a challenge, or just assist a customer that is working on an AI project.

When it’s time to move an AI project from exploration to a production application, the right choice can speed and scale the ROI of your AI investment.

Discover how NVIDIA DGX A100, powered by NVIDIA A100 Tensor Core GPUs and AMD EPYC CPUs, meets the unique demands of AI.

Show me more

Top 4 focus areas for securing your software supply chain

By JFrog

Apr 24, 20243 mins

Security

A changing market landscape requires constant evolution: Our mission for VMware customers

By Hock Tan, Broadcom President & CEO

Apr 24, 20248 mins

Cloud Computing

Microsoft and Cognizant team up to boost enterprise Copilot adoption

By Sascha Brodsky

Apr 24, 20246 mins

Generative AIIT Consulting Services

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

Apr 23, 20248 mins

CIO Leadership Live

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

Apr 23, 202414 mins

CIO Leadership Live

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

Apr 23, 20248 mins

CIO Leadership Live

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

Apr 23, 202414 mins

CIO Leadership Live

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

Apr 23, 202420 mins

CIO Leadership Live

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

Apr 23, 202420 mins

CIO Leadership Live

Africa

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

Why Purpose-Built Infrastructure is the Best Option for Scaling AI Model Development

Do-it-yourself AI infrastructure runs the risk of wasted time, money and data-scientist talent.

Show me more

Top 4 focus areas for securing your software supply chain

A changing market landscape requires constant evolution: Our mission for VMware customers

Microsoft and Cognizant team up to boost enterprise Copilot adoption

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

Why Purpose-Built Infrastructure is the Best Option for Scaling AI Model Development

Do-it-yourself AI infrastructure runs the risk of wasted time, money and data-scientist talent.

Related content

Transformational AI: Take Inspiration from Successful Implementers

AI Adoption Trends and Lessons Learned: An Expert Q&A

The Right Stuff: The Role of MLOps in AI Success

Your New Cloud for AI May Be Inside a Colo

From our editors straight to your inbox

Show me more

Top 4 focus areas for securing your software supply chain

A changing market landscape requires constant evolution: Our mission for VMware customers

Microsoft and Cognizant team up to boost enterprise Copilot adoption

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding