Big Data, Data Lake, Data-driven and Metadata

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. However, throughout history, data services have held dominion over their customers’ data.

Data Lake

Data Lake Metadata Snapshot Analytics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Businesses are constantly evolving, and data leaders are challenged every day to meet new requirements. Customers are using AWS and Snowflake to develop purpose-built data architectures that provide the performance required for modern analytics and artificial intelligence (AI) use cases.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. In this post, we discuss a common use case in relation to operational data processing and the solution we built using Apache Hudi and AWS Glue.

Data Lake

Data Lake Data Processing Metadata Snapshot

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. For metadata read/write, Flink has the catalog interface.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications.

Data Lake

Data Lake Data Transformation Cost-Benefit Data-driven

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

Implementing the right data strategy spurs innovation and outstanding business outcomes by recognizing data as a critical asset that provides insights for better and more informed decision-making. By taking advantage of data, enterprises can shape business decisions, minimize risk for stakeholders, and gain competitive advantage.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

Organizations often need to manage a high volume of data that is growing at an extraordinary rate. At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. We think of this concept as inside-out data movement. Example Corp.

Data Lake

Data Lake Analytics Dashboards Metrics

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

This is a guest post co-written by Alex Naumov, Principal Data Architect at smava. smava believes in and takes advantage of data-driven decisions in order to become the market leader. smava believes in and takes advantage of data-driven decisions in order to become the market leader.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

Today’s enterprise data analytics teams are constantly looking to get the best out of their platforms. Storage plays one of the most important roles in the data platforms strategy, it provides the basis for all compute engines and applications to be built on top of it. Separates control and data plane enabling high performance.

Data Lake

Data Lake Cost-Benefit Testing Metadata

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

FinAuto has a unique position to look across FinOps and provide solutions that help satisfy multiple use cases with accurate, consistent, and governed delivery of data and related services. These datasets can then be used to power front end systems, ML pipelines, and data engineering teams.

Finance

Finance Metadata Big Data Recreation/Entertainment

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

To transform Fujitsu from an IT company to a digital transformation (DX) company, and to become a world-leading DX partner, Fujitsu has declared a shift to data-driven management. The platform consists of approximately 370 dashboards, 360 tables registered in the data catalog, and 40 linked systems.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Amazon DataZone enables customers to discover, access, share, and govern data at scale across organizational boundaries, reducing the undifferentiated heavy lifting of making data and analytics tools accessible to everyone in the organization. This is challenging because access to data is managed differently by each of the tools.

Metadata

Metadata Data Lake Publishing Data Governance

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is the collection of policies, processes, and systems that organizations use to ensure the quality and appropriate handling of their data throughout its lifecycle for the purpose of generating business value.

Data Lake

Data Lake Data Governance Data Architecture Data Warehouse

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

AWS Big Data

APRIL 6, 2023

In this post, Morningstar’s Data Lake Team Leads discuss how they utilized tag-based access control in their data lake with AWS Lake Formation and enabled similar controls in Amazon Redshift. In this solution, we were required to ensure that the consumers could only query the data to which they had explicit access.

Data Warehouse

Data Warehouse Data Lake Management Data-driven

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

This data is used in procuring devices’ inventory to meet Amazon customers’ demands. With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data.

Optimization

Optimization Forecasting Data Lake Metadata

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

An enterprise data catalog does all that a library inventory system does – namely streamlining data discovery and access across data sources – and a lot more. For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance.

Metadata

Metadata Data Quality Data-driven Data Governance

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient.

Metrics

Metrics Visualization Dashboards Interactive

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. In this post, we discuss how you can use purpose-built AWS services to create an end-to-end data strategy for C360 to unify and govern customer data that address these challenges.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

AWS Step Functions is a fully managed visual workflow service that enables you to build complex data processing pipelines involving a diverse set of extract, transform, and load (ETL) technologies such as AWS Glue , Amazon EMR , and Amazon Redshift. The data is then cleansed, transformed, and uploaded to Amazon S3 for further processing.

Metadata

Metadata Visualization Data Lake Data-driven

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

The company uses AWS Cloud services to build data-driven products and scale engineering best practices. To ensure a sustainable data platform amid growth and profitability phases, their tech teams adopted a decentralized data mesh architecture. The solution Acast implemented is a data mesh, architected on AWS.

Data-driven

Data-driven Advertising Metadata Data Architecture

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

We’re living in the age of real-time data and insights, driven by low-latency data streaming applications. The volume of time-sensitive data produced is increasing rapidly, with different formats of data being introduced across new businesses and customer use cases.

Analytics

Analytics IoT Data-driven Snapshot

Advancing AI: The emergence of a modern information lifecycle

CIO Business Intelligence

DECEMBER 4, 2023

Although less complex than the “4 Vs” of big data (velocity, veracity, volume, and variety), orienting to the variety and volume of a challenging puzzle is similar to what CIOs face with information management. Operationalizing data to drive revenue CIOs report that their roles are rising in importance and impact.

Unstructured Data

Unstructured Data Data Lake Metadata Business Objectives

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

Organizations are managing more data than ever. With more companies increasingly migrating their data to the cloud to ensure availability and scalability, the risks associated with data management and protection also are growing. Data Security Starts with Data Governance. Who is authorized to use it and how?

Data Governance

Data Governance Cost-Benefit Risk Metadata

Unlocking the value of data as your differentiator

AWS Big Data

NOVEMBER 29, 2023

Today on the AWS re:Invent keynote stage, Swami Sivasubramanian, VP of Data and AI, AWS, spoke about the beneficial relationship among data, generative AI, and humans—all working together to unleash new possibilities in efficiency and creativity. This week, we launched many new tools to help you turn your data into your differentiator.

Data Warehouse

Data Warehouse Data Lake Dashboards Data Integration

Data Governance Makes Data Security Less Scary

erwin

OCTOBER 31, 2019

Do you know where your data is? What data you have? Add to the mix the potential for a data breach followed by non-compliance, reputational damage and financial penalties and a real horror story could unfold. s Information Commissioner’s Office had levied against both Facebook and Equifax for their data breaches.

Data Governance

Data Governance Metadata Risk Data Lake

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Today, we’re making available a new capability of AWS Glue Data Catalog that allows generating column-level statistics for AWS Glue tables. Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams.

Statistics

Statistics Data Lake Optimization Data-driven

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

FMs are multimodal; they work with different data types such as text, video, audio, and images. Large language models (LLMs) are a type of FM and are pre-trained on vast amounts of text data and typically have application uses such as text generation, intelligent chatbots, or summarization.

Data Lake

Data Lake Unstructured Data Management Modeling

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Data warehouses play a vital role in healthcare decision-making and serve as a repository of historical data. A healthcare data warehouse can be a single source of truth for clinical quality control systems. Data warehouses are mostly built using the dimensional model approach, which has consistently met business needs.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

To enable your workforce users for analytics with fine-grained data access controls and audit data access, you might have to create multiple AWS Identity and Access Management (IAM) roles with different data permissions and map the workforce users to one of those roles. We use Okta as the IdP for this demonstration.

Analytics

Analytics Data Lake Management Enterprise

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

SumUp is a leading global financial technology company driven by the purpose of leveling the playing field for small businesses. Unless, of course, the rest of their data also resides in the Google Cloud. AWS Glue gave us a cost-efficient option to migrate the data and we further optimized storage cost by pruning cold data.

Analytics

Analytics Data Lake Testing Optimization

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

AWS Big Data

APRIL 28, 2023

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. The following diagram illustrates this architecture.

Interactive

Interactive Metadata Data Warehouse Data-driven

Non-JSON ingestion using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Redshift Streaming Ingestion

AWS Big Data

OCTOBER 2, 2023

Organizations are grappling with the ever-expanding spectrum of data formats in today’s data-driven landscape. From Avro’s binary serialization to the efficient and compact structure of Protobuf, the landscape of data formats has expanded far beyond the traditional realms of CSV and JSON.

Cost-Benefit

Cost-Benefit Metadata Structured Data Management

Our Next Phase of Growth: Enterprise Data Catalogs

Alation

FEBRUARY 13, 2020

Today, we’re announcing that Alation has closed a $50 million Series C funding led by Sapphire Ventures, with participation from new investor Salesforce Ventures and our existing investors Costanoa Ventures, DCVC (Data Collective), Harmony Partners and Icon Ventures. And, the data catalog market has had a year of incredible growth.

Enterprise

Enterprise Data Lake Machine Learning Data-driven

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Webinars

Trending Sources

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Build a data lake with Apache Flink on Amazon EMR

Data Lakes on Cloud & it’s Usage in Healthcare

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Governing data in relational databases using Amazon DataZone

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Overcome these six data consumption challenges for a more data-driven enterprise

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

The Future of the Data Lakehouse – Open

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Apache Ozone and Dense Data Nodes

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

The Future of the Data Lakehouse – Open

How Fujitsu implemented a global data mesh architecture and democratized data

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Lake Formation 2022 year in review

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Five benefits of a data catalog

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Create an end-to-end data strategy for Customer 360 on AWS

Data architecture strategy for data quality

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

Design a data mesh on AWS that reflects the envisioned organization

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Advancing AI: The emergence of a modern information lifecycle

How Data Governance Protects Sensitive Data

Unlocking the value of data as your differentiator

Data Governance Makes Data Security Less Scary

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Enhance query performance using AWS Glue Data Catalog column-level statistics

Exploring real-time streaming for generative AI Applications

A hybrid approach in healthcare data warehousing with Amazon Redshift

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

How SumUp made digital analytics more accessible using AWS Glue

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

Non-JSON ingestion using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Redshift Streaming Ingestion

Our Next Phase of Growth: Enterprise Data Catalogs

Stay Connected