AWS Big Data Blog

Amazon OpenSearch Serverless expands support for larger workloads and collections

We recently announced new enhancements to Amazon OpenSearch Serverless that can scan and search source data sizes of up to 6 TB. At launch, OpenSearch Serverless supported searching one or more indexes within a collection, with the total combined size of up to 1 TB. With the support for 6 TB source data, you can now scale up your log analytics, machine learning applications, and ecommerce data more effectively. With OpenSearch Serverless, you can enjoy the benefits of these expanded limits without having to worry about sizing, monitoring your usage, or manually scaling an OpenSearch domain. If you are new to OpenSearch Serverless, refer to Log analytics the easy way with Amazon OpenSearch Serverless to get started.

The compute capacity in OpenSearch Serverless used for data ingestion and search and query is measured in OpenSearch Compute Units (OCUs). To support larger datasets, we have raised the OCU limit from 50 to 100 for indexing and search, including redundancy for Availability Zone outages and infrastructure failures. These OCUs are shared among various collections, each containing one or more indexes of varied sizes. You can configure maximum OCU limits on search and indexing independently using the AWS Command Line Interface (AWS CLI), SDK, or AWS Management Console to manage costs. Furthermore, you can have multiple 6 TB collections. If you wish to expand the OCU limits for indexes and collection sizes beyond 6 TB, reach out to us through AWS Support.

Set max OCU to 100

To get started, you must first change the OCU limits for indexing and search to 100. Note that you only pay for the resources consumed and not for the max OCU configuration.

Ingesting the data

You can use the load generation scripts shared in the following workshop or you can use your own application or data generator to create load. You can run multiple instances of these scripts to generate a burst in indexing requests. As seen in the following screenshot, in this test, we created six indexes approximating to 1 TB or more.

Auto scaling resources in OpenSearch Serverless

The highlighted points in the following figures show how OpenSearch Serverless responds to the increasing indexing traffic from 2,000 bulk request operations to 7,000 bulk requests per second by auto scaling the OCUs. Each bulk request includes 7,500 documents. OpenSearch Serverless uses various system signals to automatically scale out the OCUs based on your workload demand.

OpenSearch Serverless also scales down indexing OCUs when there is a decrease in your workload’s activity level. The highlighted points in the following figures show a gradual decrease in indexing traffic from 7,000 bulk ingest operations to less than 1,000 operations per second. OpenSearch Serverless reacts to the changes in load by reducing the number of OCUs.

Conclusion

We encourage you to take advantage of the 6 TB index support and put it to the test! Migrate your data, explore the improved throughput, and take advantage of the enhanced scaling capabilities. Our goal is to deliver a seamless and efficient experience that aligns with your requirements.

To get started, refer to Log analytics the easy way with Amazon OpenSearch Serverless. To get hands-on with OpenSearch Serverless, follow the Getting started with Amazon OpenSearch Serverless workshop, which has a step-by-step guide for configuring and setting up an OpenSearch Serverless collection.

If you have feedback about this post, share it in the comments section. If you have questions about this post, start a new thread on the Amazon OpenSearch Service forum or contact AWS Support.


About the author

Prashant Agrawal is a Sr. Search Specialist Solutions Architect with Amazon OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining AWS, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.