The Top 5 Questions About Implementing Dataiku, Answered

Dataiku Product Catie Grasso

Our implementation and field engineering teams are regularly asked some common questions about implementing Dataiku, ranging from general enablement down to technical specifics. To centralize the answers in one place, we sat down with Xavier Thierry, Director of Field Engineering, Americas, for his helpful insights, which — bear in mind — are focused on Dataiku’s cloud stacks accelerator, meaning the cloud-first service in which Dataiku is deployed in a customer’s cloud (i.e., AWS or Azure). 

The customer is empowered to manage their cloud stacks accelerator through a clickable interface called Fleet Manager. It runs within the customer’s cloud environment where they fully own it and have sole access to it and, with just a few clicks, they can have a full elastic AI stack including design, MLOps, governance, elastic compute, and more.

question mark on coral background

1. How long does it take to get the platform up and running?

Internally, when Dataiku controls the infrastructure, we can spin up new environments in minutes. Externally, things usually take longer as we need to work with clients to get all of the correct information. To help give customers an idea of how long implementation will take, we bucket them into basic, intermediate, and advanced deployments:

  • Basic: Just Dataiku with a couple of data connections, so this doesn’t usually take more than eight hours in total
  • Intermediate: The customer has data connections, multiple environments allowing for a robust dev-test-prod approach, and are starting to add external compute. Depending on what’s encountered on the customer side, this typically takes anywhere between eight and 24 hours.
  • Advanced; This is an enterprise installation and typically takes 24-72 hours of total implementation time.

As we alluded to above, all of these numbers are not hard set in stone, but rather guidelines for what we expect. Things are always subject to change depending on the customer environment, how interactions happen within said environment, and their processes, security, and approval to deploy software.

2. How will we get trained?

Today, when it comes to administering a Dataiku environment, we have someone from field engineering do a hands-on walk through of the administrative panel, highlight what does what, what each element of the panel means, and best practices along the way. In the near future, we will have a series of blog posts and self-guided learning in the Dataiku Academy so that admin training can be completely self service.

3. What’s required to install Dataiku? Do I need prerequisites spun up? What do I have to do as a customer to get ready for a Dataiku implementation?

For the cloud stacks accelerator, we need to start with a virtual network that we can install the software into so that Dataiku can communicate with other instances and databases within the customer environment. Then, the next layer down would be cloud permissions for the fleet manager (the component that manages the self-managed platform) to be able to do what it does (i.e., when it creates a Dataiku instance, it has to be able to create the virtual machine, put in network security settings, etc.) on whatever cloud they’re operating on. 

This step will require two sets of permissions: one for fleet manager and one for the actual Dataiku instance that gets spun up. So typically the customer would follow up and ask, “Why do I need two sets of permissions for Dataiku to operate on my cloud environment?” The answer is that one is for fleet manager to do what it does and create instances and the other is delegating permissions to the actual Dataiku instance. For example, if Dataiku is managing elastic AI compute from the platform, the Dataiku instance will need permissions to create the managed Kubernetes offering.

Next, it’s helpful if the customer has a cloud admin that has the ability to spin up our infrastructure as code template (i.e., in AWS it’s CloudFormation, in Azure it's Azure Resource Manager). Someone needs to be able to create that, fill it in with the correct information, launch fleet manager, and go from there. Sometimes, we will need a network admin to answer questions on the virtual network. This depends on how the customer segments themselves within their organization, but if they are a separate profile within the organization, they could be helpful to us when working through the networking pieces. Ideally, the main three helpful profiles for implementation would be a cloud admin, Dataiku admin, and network admin.

4. Once we’ve spun up the platform, how do we maintain it, keep it live, and recover if there’s a disaster? 

Customers can use the visual interface to set their recovery point objectives and snapshot frequency for disaster recovery. Dataiku leverages volume snapshots for recovery capabilities to ensure they can easily recover and redeploy to avoid loss of data and AI projects in just a few clicks. The average time it takes to reprovision is 10-15 minutes.

5. Dataiku offers a variety of nodes: Design, Automation, Deployer, and Govern. How do I manage the instances and take care of configurations? 

With instance templates, customers can have all of their instances configured with the same setups so they’re all consistent. With fleet blueprints, we can create a set of instances that all know about each other and perform their various tasks in a unified way. When going that route, customers will get an administrative Design node to do all of the log ingestion from server data from all of the instances, along with centralized logging and the ability to know about Govern and how to do deployments. This way, the instances are aware of each other and know their specific functions. 

You May Also Like

I Have GCP, Why Do I Need Dataiku?

Read More

How to Build Tailored Enterprise Chatbots at Scale

Read More

Operationalizing Data Quality: The Key to Successful Modern Analytics

Read More

Alteryx to Dataiku: AutoML

Read More