Varada Open-Sources Its Workload Analyzer to Help Data Teams Optimize Data Lake Queries

Content By Devops .com

Workload Analyzer gives data engineers holistic visibility into performance of Presto®  clusters, enabling resource optimization and improved service to business-wide users of Big Data analytics

 

TEL AVIV, Israel — February 2, 2021 — Varada, the data lake query acceleration innovator, today announced that it has open-sourced its Workload Analyzer for Presto, including both Trino (formerly known as PrestoSQL) and PrestoDB, making the source code available to everyone via Github. The Workload Analyzer is a free, easy-to-use tool that offers visibility into how Big Data and analytics workloads are performing, offering users insights into how to improve performance and optimize resources. Download the Workload Analyzer here.

“Presto democratized Big Data, exponentially expanding the number of business users that can ask questions to a Big Data infrastructure and enlarging the number of underlying data sources they can query,” said Ori Reshef, vice president of products at Varada. “But as the number of users within an organization grows, the challenge of DataOps teams is to keep queries running quickly, delivering results in a timely way so that those users can do their jobs. Unfortunately, DataOps teams are only able to get bits and pieces of the information they need to optimize resources from Presto itself. So Varada built the Workload Analyzer to give DataOps teams deep and actionable insights.”

 

The Workload Analyzer collects details and metrics on every query, aggregates and extracts information, and delivers dozens of charts describing all the facets of cluster performance. For the first time, data engineers have a holistic view of their cluster and can drill down into pain points to determine what queries to optimize and how. Download a sample Presto Workload analysis report.

 

The Workload Analyzer is compatible with PrestoDB and Trino. The Workload Analyzer script runs safely within the Presto cluster in the user’s Virtual Private Cloud (VPC), collecting and analyzing query statistics (JSONs). No data leaves the cluster and the tool does not require any external resources. The Workload Analyzer has already been tested on dozens of massive scale production clusters, resulting in zero impact on query performance.

 

Using the Workload Analyzer, data teams can:
  • Learn how resources are used on an hourly and weekly basis and define scaling rules

  • Identify heavy spenders and improve the pipeline
  • Improve predicate pushdown and significantly reduce IO and CPU
  • Identify “hottest” data
  • Improve JOINs performance
  • Provide a better production roll-out experience and identify upgrade risks upfront

 

“We’re already seeing this tool used in amazing ways,” said Reshef. “For example, one company is using it as a quality assurance tool for daily tests on large clusters. Another is using it for strategic planning to understand the best data sets to query for business users, while allocating resources effectively to significantly reduce costs. The number of use cases continues to rise.”

 

Presto: A Tool of Choice for Data-driven Companies
Presto is an open source distributed SQL query engine for running interactive analytic queries. Presto offers many benefits, most notably its ability to quickly run queries on a wide variety of data sources all at once, including ‘raw,’ unmodeled data. With this capability, as well as other unique advantages, Presto has quickly become a tool of choice for many significant data-driven companies.

 

The Varada Commitment to the Trino and PrestoDB Communities
“As part of our deep commitment to the PrestoDB and Trino communities, Varada decided to release a standalone, open source version of our Workload Analyzer tool so that any Presto user can evaluate potential performance improvements in their cluster,” said Eran Vanounou, CEO of Varada. “The tool will help PrestoDB and Trino users optimize their clusters on their own using their existing solutions. Of course, we anticipate that after discovering the existing inefficiencies within their clusters, many users will want to further evaluate how adding an indexing layer to PrestoDB or Trino can help them vastly improve performance. We will be more than happy to demonstrate how the Varada Data Platform can do just that.”

 

Varada leverages Presto in its innovative query acceleration engine, the Varada Data Platform. A big data infrastructure solution for fast analytics on thousands of dimensions, the Varada Data Platform became generally available in December 2020. Varada’s proprietary indexing layer runs on top of Presto, improving Presto’s query response time by x10-x100.

 

About Varada
The Varada mission is to enable data practitioners to go beyond the traditional limitations imposed by data infrastructure and instead zero in on the data and answers they need—with complete control over performance, cost and flexibility. In Varada’s world of big data, every query can find its optimal plan, with no prior preparation and no bottlenecks, providing consistent performance at a petabyte scale. Varada was founded by veterans of the Dell EMC XtremIO core team, and is dedicated to leveraging the data lake architecture to take on the challenge of data and business agility. Varada has been recognized in the Cool Vendors in Data Management report by Gartner, Inc. For more information, visit: https://varada.io/

 

Presto® is a trademark of The Linux Foundation.

Leave a Reply

Your email address will not be published. Required fields are marked *