AWS Big Data Blog

How Gameskraft uses Amazon Redshift data sharing to support growing analytics workloads

This post is co-written by Anshuman Varshney, Technical Lead at Gameskraft.

Gameskraft is one of India’s leading online gaming companies, offering gaming experiences across a variety of categories such as rummy, ludo, poker, and many more under the brands RummyCulture, Ludo Culture, Pocket52, and Playship. Gameskraft holds the Guinness World Record for organizing the world’s largest online rummy tournament, and is one of India’s first gaming companies to build an ISO certified platform.

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Amazon Redshift enables you to use SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning (ML) to deliver the best price-performance at scale.

In this post, we show how Gameskraft used Amazon Redshift data sharing along with concurrency scaling and WLM optimization to support its growing analytics workloads.

Amazon Redshift use case

Gameskraft used Amazon Redshift RA3 instances with Redshift Managed Storage (RMS) for their data warehouse. The upstream data pipeline is a robust system that integrates various data sources, including Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (Amazon MSK) for handling clickstream events, Amazon Relational Database Service (Amazon RDS) for delta transactions, and Amazon DynamoDB for delta game-related information. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience. All of this diverse data is then consolidated into the Amazon Simple Storage Service (Amazon S3) data lake before being uploaded to the Redshift data warehouse. These upstream data sources constitute the data producer components.

Gameskraft used Amazon Redshift workload management (WLM) to manage priorities within workloads, with higher priority being assigned to the extract, transform, and load (ETL) queue that runs critical jobs for data producers. The downstream consumers consist of business intelligence (BI) tools, with multiple data science and data analytics teams having their own WLM queues with appropriate priority values.

As Gameskraft’s portfolio of gaming products increased, it led to an approximate five-times growth of dedicated data analytics and data science teams. Consequently, there was a fivefold rise in data integrations and a fivefold increase in ad hoc queries submitted to the Redshift cluster. These query patterns and concurrency were unpredictable in nature. Also, over time the number of BI dashboards (both scheduled and live) increased, which contributed to more queries being submitted to the Redshift cluster.

With this growing workload, Gameskraft was observing the following challenges:

  • Increase in critical ETL job runtime
  • Increase in query wait time in multiple queues
  • Impact of unpredictable ad hoc query workloads across other queues in the cluster

Gameskraft was looking for a solution that would help them mitigate all these challenges, and provide flexibility to scale ingestion and consumption workload processing independently. Gameskraft was also looking for a solution that would cater to their unpredictable future growth.

Solution overview

Gameskraft tackled these challenges in a phased manner using Amazon Redshift concurrency scaling, Amazon Redshift data sharing, Amazon Redshift Serverless, and Redshift provisioned clusters.

Amazon Redshift concurrency scaling lets you easily support thousands of concurrent users and concurrent queries, with consistently fast query performance. As concurrency increases, Amazon Redshift automatically adds query processing power in seconds to process queries without any delays. When the workload demand subsides, this extra processing power is automatically removed, so you pay only for the time when concurrency scaling clusters are in use. Amazon Redshift offers 1 hour of free concurrency scaling credits per active cluster per day, allowing you to accumulate 30 hours of free credits per month.

Gameskraft enabled concurrency scaling in selective WLM queues to ease the query wait time in those queues during peak usage and also helped reduce ETL query runtime. In the prior setup, we maintained four specialized queues for ETL, ad hoc queries, BI tools, and data science. To prevent blockages for other processes, we imposed minimal query timeouts using query monitoring rules (QMR). However, both the ETL and BI tools queues were persistently occupied, impacting the performance of the remaining queues.

Concurrency scaling helped alleviate query wait time in the ad hoc query queue. Still, the challenge of downstream consumption workloads (like ad hoc queries) impacting ingestion persisted, and Gameskraft was looking for a solution to manage these workloads independently.

The following table summarizes the workload management configuration prior to the solution implementation.

Queue Usage Concurrency Scaling Mode Concurrency on Main / Memory % Query Monitoring Rules
etl For ingestion from multiple data integration off auto Stop action on:
Query runtime (seconds) > 2700
report For scheduled reporting purposes off auto Stop action on:
Query runtime (seconds) > 600
datascience For data science workloads off auto Stop action on:
Query runtime (seconds) > 300
readonly For ad hoc and day-to-day analysis auto auto Stop action on:
Query runtime (seconds) > 120
bi_tool For BI tools auto auto Stop action on:
Query runtime (seconds) > 300

To achieve flexibility in scaling, Gameskraft used Amazon Redshift data sharing. Amazon Redshift data sharing allows you to extend the ease of use, performance, and cost benefits offered by a single cluster to multi-cluster deployments while being able to share data. Data sharing enables instant, granular, and fast data access across Amazon Redshift data warehouses without the need to copy or move it. Data sharing provides live access to data so that users always observe the most up-to-date and consistent information as it’s updated in the data warehouse. You can securely share live data across provisioned clusters, serverless endpoints within AWS account, across AWS accounts, and across AWS Regions.

Data sharing builds on Redshift Managed Storage (RMS), which underpins RA3 provisioned clusters and serverless workgroups, allowing multiple warehouses to query the same data with separate isolated compute. Queries accessing shared data run on the consumer cluster and read data from RMS directly without impacting the performance of the producer cluster. You can now rapidly onboard workloads with diverse data access patterns and SLA requirements and not be concerned about resource contention.

We chose to run all ETL workloads in the primary producer cluster to manage ETL independently. We used data sharing to share read-only access to data with a data science serverless workgroup, a BI provisioned cluster, an ad hoc query provisioned cluster, and a data integration serverless workgroup. Teams using these separate compute resources could then query the same data without copying the data between the producer and consumer. Additionally, we introduced concurrency scaling to the consumer queues, prioritizing BI tools, and extended the timeout for the remaining queues. These modifications notably enhanced overall efficiency and throughput.

The following table summarizes the new workload management configuration for the producer cluster.

Queue Usage Concurrency Scaling Mode Concurrency on Main / Memory % Query Monitoring Rules
etl For ingestion from multiple data integration auto auto Stop action on:
Query runtime (seconds) > 3600

The following table summarizes the new workload management configuration for the consumer cluster.

Queue Usage Concurrency Scaling Mode Concurrency on Main / Memory % Query Monitoring Rules
report For scheduled reporting purposes off auto Stop action on:
Query runtime (seconds) > 1200
Query queue time (seconds) > 1800
Spectrum scan row count (rows) > 100000
Spectrum scan (MB) > 3072
datascience For data science workloads off auto Stop action on:
Query runtime (seconds) > 600
Query queue time (seconds) > 1800
Spectrum scan row count (rows) > 100000
Spectrum scan (MB) > 3072
readonly For ad hoc and day-to-day analysis auto auto Stop action on:
Query runtime (seconds) > 900
Query queue time (seconds) > 3600
Spectrum scan (MB) > 3072
Spectrum scan row count (rows) > 100000
bi_tool_live For live BI tools auto auto Stop action on:
Query runtime (seconds) > 900
Query queue time (seconds) > 1800
Spectrum scan (MB) > 1024
Spectrum scan row count (rows) > 1000
bi_tool_schedule For scheduled BI tools auto auto Stop action on:
Query runtime (seconds) > 1800
Query queue time (seconds) > 3600
Spectrum scan (MB) > 1024
Spectrum scan row count (rows) > 1000

Solution implementation

Gameskraft is dedicated to maintaining uninterrupted system operations, prioritizing seamless solutions over downtime. In pursuit of this principle, strategic measures were undertaken to ensure a smooth migration process towards enabling data sharing, which included the following steps:

  • Planning:
    • Replicating users and groups to the consumer, to mitigate potential access complications for analytics, data science, and BI teams.
    • Establishing a comprehensive setup within the consumers, encompassing essential components like external schemas for Amazon Redshift Spectrum.
    • Fine-tuning WLM configurations tailored to the consumer’s requirements.
  • Implementation:
    • Introducing insightful monitoring dashboards in Grafana for CPU utilization, read/write throughputs, IOPS, and latencies specific to the consumer cluster, enhancing oversight capabilities.
    • Changing all interleaved key tables on the producer cluster to compound sortkey tables to seamlessly transition data.
    • Creating an external schema from the data share database on the consumer, mirroring that of the producer cluster with identical names. This approach minimizes the need for making query adjustments in multiple locations.
  • Testing:
    • Conducting an internal week-long regression testing and auditing process to meticulously validate all data points by running the same workload and twice the workload.
  • Final changes:
    • Updating the DNS record for the cluster endpoint, which included replacing the consumer cluster’s endpoint to the same domain as the producer cluster’s endpoint, to streamline connections and avoid making changes in multiple places.
    • Ensuring data security and access control by revoking group and user privileges from the producer cluster.

The following diagram illustrates the Gameskraft Amazon Redshift data sharing architecture.

This diagram illustrates the Gameskraft Amazon Redshift data sharing architecture

The following diagram illustrates the Amazon Redshift data sharing architecture with multiple consumer clusters.

This diagram illustrates the Amazon Redshift data sharing architecture with multiple consumer clusters.

With data sharing implementation, Gameskraft was able to isolate the producer and consumer workloads. Data sharing also provided the flexibility to independently scale the producer and consumer data warehouses.

The implementation of the overall solution helped Gameskraft support more frequent data refresh (43% reduction in overall job runtime) for its ETL workload, which runs on the producer cluster, along with capabilities to support a growing (five-times increase in users, BI workloads, and ad hoc queries) and unpredictable consumer workload.

The following dashboards show some of the critical ETL pipeline runtimes (before solution implementation and after solution implementation).

The first shows the delta P1/P2/P3 job runs before and after solution implementation (duration in minutes).

This screenshot shows the delta P1/P2/P3 job runs before and after solution implementation (duration in minutes).

The following shows the daily event ETL P1/P2/P3 job runs before and after solution implementation (duration in minutes).

This screenshot shows the daily event ETL P1/P2/P3 job runs before and after solution implementation (duration in minutes).

Key considerations

Gameskraft embraces a modern data architecture, with the data lake residing in Amazon S3. To grant seamless access to the data lake, we use the innovative capabilities of Redshift Spectrum, which is a bridge between the data warehouse (Amazon Redshift) and data lake (Amazon S3). It allows you to perform data transformations and analysis directly on data stored in Amazon S3, without having to duplicate the data into your Redshift cluster.

Gameskraft had a couple of key learnings while implementing this data sharing solution:

  • First, as of this writing, Amazon Redshift data sharing doesn’t support adding external schemas, tables, or late-binding views on external tables to the data share. To enable this, we have created an external schema as a pointer to AWS Glue database. The same AWS Glue database is referenced in the external schema on the consumer side.
  • Second, Amazon Redshift doesn’t support sharing tables with interleaved sort keys and views that refer to tables with interleaved sort keys. Due to the presence of interleaved sort keys across numerous tables and views, a prerequisite for inclusion into the data share involves revising the sort key configuration to use compound sort key.

Conclusion

In this post, we saw how Gameskraft used data sharing and concurrency scaling in Amazon Redshift with a producer and consumer cluster architecture to achieve the following:

  • Reduce query wait time for all queues in the producer and consumer
  • Scale the producer and consumer independently based on workload and queue requirements
  • Improve ETL pipeline performance and the data refresh cycle to support more frequent refreshes in the producer cluster
  • Onboard more queues and workloads (BI tools queue, data integration queue, data science queue, downstream team’s queue, ad hoc query queue) in the consumer without impacting the ETL pipeline in the producer cluster
  • Flexibility to use multiple consumers with a mix of provisioned Redshift cluster and Redshift Serverless

These Amazon Redshift features and architecture can help support a growing and unpredictable analytics workload.


About the Authors

Anshuman Varshney is Technical Lead at Gameskraft with a background in both backend and data engineering. He has a proven track record of leading and mentoring cross-functional teams to deliver high-performance, scalable solutions. Apart from work, he relishes moments with his family, indulges in cinematic experiences, and seizes every opportunity to explore new destinations through travel.

Prafulla Wani is an Amazon Redshift Specialist Solution Architect at AWS. He works with AWS customers on analytics architecture designs and Amazon Redshift proofs of concept. In his spare time, he plays chess with his son.

Saurov Nandy is a Solutions Architect at AWS. He works with AWS customers to design and implement solutions that solve complex business problems. In his spare time, he likes to explore new places and indulge in photography and video editing.

Shashank Tewari is a Senior Technical Account Manager at AWS. He helps AWS customers optimize their architectures to achieve performance, scale, and cost efficiencies. In his spare time, he likes to play video games with his kids. During vacations, he likes to trek on mountains and take up adventure sports.