article thumbnail

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse ( CDW ), Cloudera Data Engineering ( CDE ), and Cloudera Machine Learning ( CML ). We see that as of the first snapshot ( 7445571238522489274) we had data from the years 1995 to 2005 in the table.

article thumbnail

Applying Fine Grained Security to Apache Spark

Cloudera

The introduction of “Secure Access” mode to HWC avoids these drawbacks by relying on Hive to obtain a secure snapshot of the data that is then operated upon by Spark. If you are already a user of HWC, you can continue using hive.executeQuery() or hive.sql() in your Spark application to obtain the data securely. . df.show().

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

Use our 14-days free trial today & transform your supply chain! Welcome To The Future Of Logistics We’re on the cusp of big data transforming the nature of logistics. Big data in logistics can improve financial efficiency, provide transparency to the supply chain, and enable proactive strategic decision-making.

Big Data 275
article thumbnail

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

Specifically, the system uses Amazon SageMaker Processing jobs to process the data stored in the data lake, employing the AWS SDK for Pandas (previously known as AWS Wrangler) for various data transformation operations, including cleaning, normalization, and feature engineering.

article thumbnail

Cloudera Data Engineering 2021 Year End Review

Cloudera

Today it’s used by many innovative technology companies at petabyte scale, allowing them to easily evolve schemas, create snapshots for time travel style queries, and perform row level updates and deletes for ACID compliance. This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. .

Snapshot 117
article thumbnail

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake 105
article thumbnail

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

Apache Hudi is an open source transactional data lake framework that greatly simplifies incremental data processing and the development of data pipelines. Besides demonstrating with Hudi here, we will follow up with other OTF tables with other blogs. For Stack name , enter a stack name (for example, rsv2-emr-hudi-blog ).