Home BrandpostsDelivering AI Infrastructure at ScaleMake Better AI Infrastructure Decisions: Why Hybrid Cloud is a Solid Fit

Make Better AI Infrastructure Decisions: Why Hybrid Cloud is a Solid Fit

BrandPost By Keith Shaw

May 23, 2022

Cloud ArchitectureIT Leadership

The unique demands of AI workloads drive increasing popularity of pairing on-premises infrastructure with cloud.

Credit: iStock

The traditional approach for artificial intelligence (AI) and deep learning projects has been to deploy them in the cloud. Because it’s common for enterprise software development to leverage cloud environments, many IT groups assume that this infrastructure approach will succeed as well for AI model training.

For many nascent AI projects in the prototyping and experimentation phase, the cloud works just fine. But companies often discover that as data sets grow in volume and AI model complexity increases, the escalating cost of compute cycles, data movement, and storage can spiral out of control. Called data gravity, it’s the cost and workflow latency of bringing large data sets from where they’re created to where compute resources reside. It has caused many companies to consider moving their AI training from the cloud back to an on-premises data center that is data-proximate.

Hybrid is a perfect fit for some AI projects

There’s an alternative worth exploring — one that avoids forcing an either/or choice around cloud and on-premises. A hybrid cloud infrastructure approach enables companies to take advantage of both environments. In this case, organizations can utilize on-premises infrastructure for their on-going “steady state” training demands, supplemented with cloud services for temporal spikes or unpredictable surges that exceed that capacity.

“The saying: ‘Own the base, rent the spike’ captures the essence of this situation,” says Tony Paikeday, senior director of AI systems at NVIDIA. “Enterprise IT provisions on-prem infrastructure to support the steady-state volume of AI workloads and retains the ability to burst to the cloud whenever extra capacity is needed.”

This approach secures continuous availability of compute resources for developers, while ensuring the lowest cost per training run.

With the rise of container orchestration platforms such as Kubernetes and others, enterprises can more effectively manage the allocation of compute resources that straddle between cloud instances and on-prem hardware, such as NVIDIA DGX A100 systems.

For example, aerospace company Lockheed Martin utilizes an approach where they run experiments on smaller AI models using GPU-enabled cloud instances, and their DGX server for training and inference on their largest projects. Although the AI team uses cloud, the DGX systems remain their sole resource for GPU compute, as it is more difficult to conduct model and data parallelism across cloud instances, says Paikeday.

He stresses that there isn’t a single answer for all companies when it comes to the question of on-premises versus cloud-only versus hybrid approaches.

“Different companies approach this from different angles, and some will naturally gravitate to cloud, based on where their data sets are created and live,” he says.

For others whose data lake resides on-prem or even in a colocation facility, they may eventually see the growing benefit of making their training infrastructure data-proximate, especially as their AI maturity grows.

“Others who have already invested in on-prem will say that it’s a natural extension of what they’ve got,” Paikeday says. “Somewhere these two camps will meet in the middle, and both will embrace a hybrid infrastructure. Because of the nature and uniqueness of AI model development, they will realize that companies can have a balance of both infrastructure types.”

Click here to learn more about the benefits of using a hybrid infrastructure for your AI model development using NVIDIA DGX systems, powered by DGX A100 Tensor Core GPUs and AMD EPYC CPUs.

About Keith Shaw:

Keith is a freelance digital journalist who has written about technology topics for more than 20 years.

Show me more

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

Apr 23, 20248 mins

CIO Leadership Live

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

Apr 23, 202414 mins

CIO Leadership Live

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

Apr 23, 20248 mins

CIO Leadership Live

How Does GenAI Fit into the Enterprise?

Apr 29, 202430 mins

IT Leadership

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

Apr 25, 20242 mins

IT Leadership

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools

Apr 25, 202415 mins

IT Leadership

Africa

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

Make Better AI Infrastructure Decisions: Why Hybrid Cloud is a Solid Fit

The unique demands of AI workloads drive increasing popularity of pairing on-premises infrastructure with cloud.

Hybrid is a perfect fit for some AI projects

Show me more

The Java migration imperative: Why your business should upgrade now

Get Ready for FutureIT Boston With This AI Infographic

Atos may sell national security activities to French government

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

How Does GenAI Fit into the Enterprise?

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools

Make Better AI Infrastructure Decisions: Why Hybrid Cloud is a Solid Fit

The unique demands of AI workloads drive increasing popularity of pairing on-premises infrastructure with cloud.

Hybrid is a perfect fit for some AI projects

Related content

Transformational AI: Take Inspiration from Successful Implementers

AI Adoption Trends and Lessons Learned: An Expert Q&A

Why Purpose-Built Infrastructure is the Best Option for Scaling AI Model Development

The Right Stuff: The Role of MLOps in AI Success

From our editors straight to your inbox

Show me more

The Java migration imperative: Why your business should upgrade now

Get Ready for FutureIT Boston With This AI Infographic

Atos may sell national security activities to French government

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

How Does GenAI Fit into the Enterprise?

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools