AWS Big Data Blog

Introducing Terraform support for Amazon OpenSearch Ingestion

Today, we are launching Terraform support for Amazon OpenSearch Ingestion. Terraform is an infrastructure as code (IaC) tool that helps you build, deploy, and manage cloud resources efficiently. OpenSearch Ingestion is a fully managed, serverless data collector that delivers real-time log, metric, and trace data to Amazon OpenSearch Service domains and Amazon OpenSearch Serverless collections. In this post, we explain how you can use Terraform to deploy OpenSearch Ingestion pipelines. As an example, we use an HTTP source as input and an Amazon OpenSearch Service domain (Index) as output.

Solution overview

The steps in this post deploy a publicly accessible OpenSearch Ingestion pipeline with Terraform, along with other supporting resources that are needed for the pipeline to ingest data into Amazon OpenSearch. We have implemented the Tutorial: Ingesting data into a domain using Amazon OpenSearch Ingestion, using Terraform.

We create the following resources with Terraform:

The pipeline that you create exposes an HTTP source as input and an Amazon OpenSearch sink to save batches of events.

Prerequisites

To follow the steps in this post, you need the following:

  • An active AWS account.
  • Terraform installed on your local machine. For more information, see Install Terraform.
  • The necessary IAM permissions required to create the AWS resources using Terraform.
  • awscurl for sending HTTPS requests through the command line with AWS Sigv4 authentication. For instructions on installing this tool, see the GitHub repo.

Create a directory

In Terraform, infrastructure is managed as code, called a project. A Terraform project contains various Terraform configuration files, such as main.tf, provider.tf, variables.tf, and output.df . Let’s create a directory on the server or machine that we can use to connect to AWS services using the AWS Command Line Interface (AWS CLI):

mkdir osis-pipeline-terraform-example

Change to the directory.

cd osis-pipeline-terraform-example

Create the Terraform configuration

Create a file to define the AWS resources.

touch main.tf

Enter the following configuration in main.tf and save your file:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.36"
    }
  }

  required_version = ">= 1.2.0"
}

provider "aws" {
  region = "eu-central-1"
}

data "aws_region" "current" {}
data "aws_caller_identity" "current" {}
locals {
    account_id = data.aws_caller_identity.current.account_id
}

output "ingest_endpoint_url" {
  value = tolist(aws_osis_pipeline.example.ingest_endpoint_urls)[0]
}

resource "aws_iam_role" "example" {
  name = "exampleosisrole"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Sid    = ""
        Principal = {
          Service = "osis-pipelines.amazonaws.com"
        }
      },
    ]
  })
}

resource "aws_opensearch_domain" "test" {
  domain_name           = "osi-example-domain"
  engine_version = "OpenSearch_2.7"
  cluster_config {
    instance_type = "r5.large.search"
  }
  encrypt_at_rest {
    enabled = true
  }
  domain_endpoint_options {
    enforce_https       = true
    tls_security_policy = "Policy-Min-TLS-1-2-2019-07"
  }
  node_to_node_encryption {
    enabled = true
  }
  ebs_options {
    ebs_enabled = true
    volume_size = 10
  }
 access_policies = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "${aws_iam_role.example.arn}"
      },
      "Action": "es:*"
    }
  ]
}

EOF

}

resource "aws_iam_policy" "example" {
  name = "osis_role_policy"
  description = "Policy for OSIS pipeline role"
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
        {
          Action = ["es:DescribeDomain"]
          Effect = "Allow"
          Resource = "arn:aws:es:${data.aws_region.current.name}:${local.account_id}:domain/*"
        },
        {
          Action = ["es:ESHttp*"]
          Effect = "Allow"
          Resource = "arn:aws:es:${data.aws_region.current.name}:${local.account_id}:domain/osi-test-domain/*"
        }
    ]
})
}

resource "aws_iam_role_policy_attachment" "example" {
  role       = aws_iam_role.example.name
  policy_arn = aws_iam_policy.example.arn
}

resource "aws_cloudwatch_log_group" "example" {
  name = "/aws/vendedlogs/OpenSearchIngestion/example-pipeline"
  retention_in_days = 365
  tags = {
    Name = "AWS Blog OSIS Pipeline Example"
  }
}

resource "aws_osis_pipeline" "example" {
  pipeline_name               = "example-pipeline"
  pipeline_configuration_body = <<-EOT
            version: "2"
            example-pipeline:
              source:
                http:
                  path: "/test_ingestion_path"
              processor:
                - date:
                    from_time_received: true
                    destination: "@timestamp"
              sink:
                - opensearch:
                    hosts: ["https://${aws_opensearch_domain.test.endpoint}"]
                    index: "application_logs"
                    aws:
                      sts_role_arn: "${aws_iam_role.example.arn}"   
                      region: "${data.aws_region.current.name}"
        EOT
  max_units                   = 1
  min_units                   = 1
  log_publishing_options {
    is_logging_enabled = true
    cloudwatch_log_destination {
      log_group = aws_cloudwatch_log_group.example.name
    }
  }
  tags = {
    Name = "AWS Blog OSIS Pipeline Example"
  }
  }

Create the resources

Initialize the directory:

terraform init

Review the plan to see what resources will be created:

terraform plan

Apply the configuration and answer yes to run the plan:

terraform apply

The process might take around 7–10 minutes to complete.

Test the pipeline

After you create the resources, you should see the ingest_endpoint_url output displayed. Copy this value and export it in your environment variable:

export OSIS_PIPELINE_ENDPOINT_URL=<Replace with value copied>

Send a sample log with awscurl. Replace the profile with your appropriate AWS profile for credentials:

awscurl --service osis --region eu-central-1 -X POST -H "Content-Type: application/json" -d '[{"time":"2014-08-11T11:40:13+00:00","remote_addr":"122.226.223.69","status":"404","request":"GET http://www.k2proxy.com//hello.html HTTP/1.1","http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)"}]' https://$OSIS_PIPELINE_ENDPOINT_URL/test_ingestion_path

You should receive a 200 OK as a response.

To verify that the data was ingested in the OpenSearch Ingestion pipeline and saved in the OpenSearch, navigate to the OpenSearch and get its domain endpoint. Replace the <OPENSEARCH ENDPOINT URL> in the snippet given below and run it.

awscurl --service es --region eu-central-1 -X GET https://<OPENSEARCH ENDPOINT URL>/application_logs/_search | json_pp 

You should see the output as below:

Clean up

To destroy the resources you created, run the following command and answer yes when prompted:

terraform destroy

The process might take around 30–35 minutes to complete.

Conclusion

In this post, we showed how you can use Terraform to deploy OpenSearch Ingestion pipelines. AWS offers various resources for you to quickly start building pipelines using OpenSearch Ingestion and use Terraform to deploy them. You can use various built-in pipeline integrations to quickly ingest data from Amazon DynamoDB, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Security Lake, Fluent Bit, and many more. The following OpenSearch Ingestion blueprints allow you to build data pipelines with minimal configuration changes and manage them with ease using Terraform. To learn more, check out the Terraform documentation for Amazon OpenSearch Ingestion.


About the Authors

Rahul Sharma is a Technical Account Manager at Amazon Web Services. He is passionate about the data technologies that help leverage data as a strategic asset and is based out of New York city, New York.

Farhan Angullia is a Cloud Application Architect at AWS Professional Services, based in Singapore. He primarily focuses on modern applications with microservice software patterns, and advocates for implementing robust CI/CD practices to optimize the software delivery lifecycle for customers. He enjoys contributing to the open source Terraform ecosystem in his spare time.

Arjun Nambiar is a Product Manager with Amazon OpenSearch Service. He focusses on ingestion technologies that enable ingesting data from a wide variety of sources into Amazon OpenSearch Service at scale. Arjun is interested in large scale distributed systems and cloud-native technologies and is based out of Seattle, Washington.

Muthu Pitchaimani is a Search Specialist with Amazon OpenSearch Service. He builds large-scale search applications and solutions. Muthu is interested in the topics of networking and security, and is based out of Austin, Texas.