article thumbnail

Get started managing partitions for Amazon S3 tables backed by the AWS Glue Data Catalog

AWS Big Data

If you simply run queries without considering the optimal data layout on Amazon S3, it results in a high volume of data scanned, long-running queries, and increased cost. Partitioning is a common technique to lay out your data optimally for distributed analytics engines. Another option is using AWS Glue APIs.

article thumbnail

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Occam's Razor

Sometimes, we escape the clutches of this sub optimal existence and do pick good metrics or engage in simple A/B testing. You're choosing only one metric because you want to optimize it. By late 2009, that experiment was a success, too; they'd climbed back up to 4.5 But it is not routine. That metric is tied to a KPI.

Metrics 156
article thumbnail

Fitting Bayesian structural time series with the bsts R package

The Unofficial Google Data Science Blog

If both variances are positive then the optimal estimator of $y_{t+1}$ winds up being "exponential smoothing," where past data are forgotten at an exponential rate determined by the ratio of the two variances. Also notice that while the state in this model is Markov (i.e.