Dremio Launches Data Lake Service Running on the AWS Cloud

0

All Transform 2021 sessions are available on demand now. Look now.


Dremio today launched a cloud service that creates an in-memory SQL engine-based data lake that queries data stored in an object-based storage system.

The goal is to make it easier for organizations to take advantage of the data lake, dubbed Dremio Cloud, without having to employ an in-house IT team to manage it, said Tomer Shiran, product manager for Dremio. An organization can now start accessing Dremio Cloud in as little as five minutes, he said.

Based on Dremio’s existing SQL Lakehouse platform, the Dremio Cloud service runs on the Amazon Web Services (AWS) public cloud. It offers all the benefits of a data warehouse on a platform that uses an object-based storage system to reduce the total cost of creating a data lake, Shiran noted.

Building the Dremio Cloud

Dremio Cloud is based on a microservices architecture that includes a service mesh to make infrastructure resources available on demand through the Dremio Cloud control plane. As a result, customers incur no Dremio or AWS costs when the platform is idle, Shiran said.

This approach also eliminates the need to aggregate tables, extract data, or use a separate online analytical processing (OLAP) cube to structure the data in a SQL-compatible way, he added. . It also means that you don’t need to copy data stored in an object-based storage system to a proprietary data warehouse to provide access to SQL-based applications, Shiran added.

Data is encrypted both at rest and in transit using key management tools that ensure secure communication between clients, the control plane, and the data plane. Role-Based Access Controls (RBAC) allow organizations to set privileges on every set of data and object in the system. Additionally, companies can invoke existing user and group definitions in Dremio using identity management platforms such as Okta to enforce zero-trust security policies, Shiran said. Dremio Cloud has already achieved SOC 2 compliance, he added.

Dremio recently launched a Dart initiative to improve SQL query performance by a factor of five over the next 12 months using proprietary acceleration technologies it has developed. At the heart of this effort is Gandiva, a toolkit that enables vectorized execution on modern processors using the in-memory buffers of Apache Arrow, an open source columnar data format co-created by Dremio.

The company also maintains physically optimized representations of source data known as Data Reflections. The query optimizer can then speed up a query by using one or more data reflections to partially or fully show the query results without having to process the raw data for each query launched.

Dremio also supports query plan caching, which eliminates both overhead and latency for repeated queries, in addition to a high performance compiler that allows much larger and more complex SQL statements while using machine learning algorithms to reduce the amount of computational resources required to run SQL queries. Read operations from cloud storage account for 30-60% of the cost of executing queries in some workloads, explains Dremio, and the company is reducing the amount of data read from cloud object storage by improving the capabilities of discharge of the analysis filter that it provides.

Simplify data lakes

While the concept of a data lake has been around for some time now, many organizations have been hesitant to deploy it because managing petabytes of data at this scale has proven too difficult. A Hadoop-based data lake, for example, often quickly became a data swamp as new data was added. “The data teams are in a tough spot,” Shiran said.

Dremio solves this problem by integrating a range of SQL acceleration and data management tools within its platform to optimize queries on a data lake based on object storage systems that are readily available in cloud computing environments. The challenge now is to convince organizations that historically relied on a traditional data warehouse to reconsider a platform-based data lake approach that promises to simplify access to petabytes of data in the cloud.

VentureBeat

VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the topics that interest you
  • our newsletters
  • Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
  • networking features, and more

Become a member


Source link

Leave A Reply

Your email address will not be published.