article thumbnail

10 Years Later: Who’s the GOAT of Data Catalogs?

Alation

January 2015: Alation acquires its first customer. March 2015: Alation emerges from stealth mode to launch the first official data catalog to empower people in enterprises to easily find, understand, govern and use data for informed decision making that supports the business. June 2017: Yahoo Japan Corp.

article thumbnail

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

More data files leads to more metadata stored in manifest files, and small data files often cause an unnecessary amount of metadata, resulting in less efficient queries and higher Amazon S3 access costs. The output will give a count of the number of data and metadata files deleted. resource('s3') bucket = s3.Bucket('

Snapshot 102
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Illuminating the black box: why CIOs should consider publishing an annual IT report

CIO Business Intelligence

By 2015, the technical executives of at least one conglomerate, Intel, had figured they could enrich the firm’s perception of IT by showcasing how essentially that function contributes to business value. And don’t just rattle off project metadata. Such a report has a legacy already, if only a short one. What pains did it alleviate?

article thumbnail

Through the Looking Glass: Data Owners and Other Fallacies

TDAN

I vividly remember reading this passage from Bob Seiner’s TDAN.com article “Things I Think I Think about Data Governance”, from August 1, 2015: If we were going to remove two words from the Data Governance vocabulary, I would choose the words “assign” and “owner. When someone is designated as the “owner” of data, that implies […].

article thumbnail

Speed up queries with the cost-based optimizer in Amazon Athena

AWS Big Data

Starting today, the Athena SQL engine uses a cost-based optimizer (CBO), a new feature that uses table and column statistics stored in the AWS Glue Data Catalog as part of the table’s metadata. By using these statistics, CBO improves query run plans and boosts the performance of queries run in Athena. Pathik Shah is a Sr.

article thumbnail

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

CIO Business Intelligence

It launched its first online-only brand, Very, in 2009 and finally abandoned its printed catalogs to go all-in online in 2015. It took about nine weeks to set up the infrastructure, make the connection to the database, and index and understand the metadata. The whole company rebranded as Very in 2020, the year Pimblett joined.

IT 84
article thumbnail

A Few 2016 Technology Predictions

In(tegrate) the Clouds

Merv Adrian (@merv) December 19, 2015. What is the most important news item about a software company that occurred in 2015 that belongs in the capsule, and why? The resurgence of Microsoft as a cloud company was big news in 2015. Who was the biggest tech disruptor in 2015? platform.twitter.com/widgets.js.