article thumbnail

10 Years Later: Who’s the GOAT of Data Catalogs?

Alation

December 2012: Alation forms and goes to work creating the first enterprise data catalog. Later, in its inaugural report on data catalogs, Forrester Research recognizes that “Alation started the MLDC trend.”. January 2015: Alation acquires its first customer. June 2017: Yahoo Japan Corp. becomes the first customer in APAC.

article thumbnail

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

These formats enable ACID (atomicity, consistency, isolation, durability) transactions, upserts, and deletes, and advanced features such as time travel and snapshots that were previously only available in data warehouses. The output will give a count of the number of data and metadata files deleted.

Snapshot 103
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

We also used AWS Lambda for data processing. To further optimize and improve the developer velocity for our data consumers, we added Amazon DynamoDB as a metadata store for different data sources landing in the data lake. Clients access this data store with an API’s.

article thumbnail

Themes and Conferences per Pacoid, Episode 11

Domino Data Lab

In other words, using metadata about data science work to generate code. In this case, code gets generated for data preparation, where so much of the “time and labor” in data science work is concentrated. The long history and pervasiveness of SQL has helped make data-driven work much more accessible to a wider audience.

Metadata 105
article thumbnail

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

CIO Business Intelligence

It launched its first online-only brand, Very, in 2009 and finally abandoned its printed catalogs to go all-in online in 2015. He found a rich collection of data assets, including information on over 2.2 As a result, Pimblett now runs the organization’s data warehouse, analytics, and business intelligence.

IT 75
article thumbnail

Convergent Evolution

Peter James Thomas

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Next, rather than just being the province of Data Scientists, there were moves to use Data Lakes to support general Data Discovery and even business Reporting and Analytics as well. This required additional investments in metadata.

article thumbnail

Turning Streams Into Data Products

Cloudera

In 2015, Cloudera became one of the first vendors to provide enterprise support for Apache Kafka, which marked the genesis of the Cloudera Stream Processing (CSP) offering. The DevOps/app dev team wants to know how data flows between such entities and understand the key performance metrics (KPMs) of these entities.