A SQL-on-Hadoop Engine that Harnesses the Power of Full Indexing, a Columnar Structure and Intelligent Caching to Achieve Interactive BI Speed on Big Data.

DOWNLOAD JETHRO

Jethro is a SQL-on-Hadoop engine that accelerates your BI Tool visualizations, such as Tableau, Qlik or Microstrategy, by taking a unique approach to accessing big data on Hadoop. Jethro’s SQL-on-Hadoop engine acts as an acceleration layer and is unobtrusively sandwiched between Hadoop and a BI Tool.

The Fastest SQL-on-Hadoop Engine for BI. Jethro accelerates BI visualizations connected to multi-billion-row tables to perform at interactive speeds.

Fastest SQL-on-Hadoop

EASY TO GET STARTED ON HADOOP

Just install Jethro on one or few dedicated servers, point it to your Hadoop cluster, load some data and start querying. When working with Hadoop, Jethro is safe and easy to implement as it only connects remotely as an HDFS client, without installing new services or running resource-intensive computations inside the cluster (MapReduce, Spark or others). Download Jethro.

PERFORMANCE FROM A UNIQUE INDEXING TECHNOLOGY

The key to Jethro’s superior performance on Hadoop is its unique indexing technology. Jethro’s indexes are sorted, multi-hierarchy, compressed bitmaps. They are created automatically for every column, and are written in an efficient, append-only fashion – avoiding expensive random writes and locking. Queries use indexes to read only the data they need, instead of performing full scans, leading to faster response times.

SCALABILITY AND HIGH-AVAILABILITY

Jethro’s nodes are stateless and highly elastic, allowing easy scale-out to meet concurrency requirements. Jethro’s index and column files are stored as standard files on HDFS or Amazon S3 and benefit from their native scalability and high availability.

MINIMAL HADOOP CLUSTER LOAD

Other SQL-on-Hadoop solutions uses a brute force method – each node of the cluster scans and processes its local data for every query. In contrast, Jethro leverages its indexes to surgically fetch only the relevant data for each query, dramatically reducing the load on the shared Hadoop cluster – freeing it for other computations and supporting more concurrent queries.

Full Scan/Brute Force vs. Index Access

Full Scan/Brute Force

all SQL-on-Hadoop solutions

Read entire data set. Every time.

Impact on Hadoop Cluster

Massive # of unnecessary I/O

Huge consumption of CPU and memory

Index Access

Jethro SQL-on-Hadoop Solution

Analyze Indexes. Fetch only relevant data.

Impact on Hadoop Cluster

Drastically lower Hadoop cluster load

Low I/O, CPU, memory

High-Level SQL-on-Hadoop Architecture

SQL INTERFACE

BI tools, such as Tableau, Qlik or Microstrategy connect to Jethro using JDBC or ODBC and issue standard SQL queries. The driver automatically load-balances SQL statement across all Jethro hosts.

QUERY PROCESSING

Jethro runs on one or few dedicated, higher-end hosts optimized for SQL processing – with extra memory and CPU cores, and local SSD for caching. The query hosts are stateless, and new ones can be dynamically added to support additional concurrent users.

STORAGE LAYER

Jethro stores its files (e.g., indexes) in an existing Hadoop cluster or in an Amazon S3 bucket. With Hadoop, it uses a standard HDFS client (libhdfs) and is compatible with all common Hadoop distributions. Jethro only generates a light I/O load on HDFS – offloading SQL processing from Hadoop and enabling sharing the cluster between online users and batch processing.

DATA LOADING & INDEXES

A loader service processes input files and creates query-optimized column and index files, which are encoded, compressed and then stored on HDFS or Amazon S3. This service can run on its own host or on one of the query processing hosts.