Hadoop Summit 2016 North America in San Jose

When is Hadoop Summit 2016 North America in San Jose?

The Hadoop Summit 2016 North America in San Jose will be held from June 28th – June 30th 2016.

Where is Hadoop Summit 2016?

The Hadoop Summit 2016 North America in San Jose will be in the San Jose Convention Center located at 150 W San Carlos St, San Jose, CA 95113.

About Hadoop Summit 2016

The 8th Annual Hadoop Summit in San Jose is a three-day-event that will kick off on Tuesday June 28th and bring together Apache community leaders and key players under one roof. Attendees will enjoy hearing about dev and admin tips and tricks, see presentations about successful Hadoop use cases, and get educated on ideal ways to leverage Apache Hadoop as an integral component in their enterprise data architecture.

Visit Jethro & Grab Some Data Swag

Jethro will be exhibiting its SQL-on-Hadoop acceleration solution. We will discuss ways to achieve interactive speed on BI dashboards and visualizations, such as Tableau or Qlik, with direct data connections to  multi-billion-raw data sets without impacting Hadoop cluster load. Data architects and data scientists are invited to Jethro’s booth to see live benchmarks and demos.

We look forward to seeing you at the Hadoop Summit 2016 in San Jose!

Jethro 1.6.0 Released

The main themes of Jethro 1.6.0 are concurrency and new range-index features.

Concurrency features are:

  • Reuse of results when the same “where” clause is used by multiple queries, in order to reduce resource consumption and increase concurrency.

  • Enhanced locking infrastructure to protect against deadlocks during high load.

  • Increase the maximum number of threads allocated by the operating system to be used by the Jethro services

Range Indexes:

This new feature allows you to create special indexes for ranges of values.  This significantly increases the performance of queries that employ wide range of column values in filters.

Full list of version issues can be found on our release notes page: http://jethro.io/release-notes/

Updated documentation: https://jethrodownload.s3.amazonaws.com/JethroReferenceGuide.pdf

Tips for Installing Jethro

You’ve already downloaded Jethro, now you’re ready to install it and start accelerating your database performance. Follow these below tips to get you started on the right foot. You can always contact us with any problems or questions along the way. 

Top 11 Tips for Optimizing Jethro Performance 

  1. Optimal Queries – Use as Many Filters as Possible
    1. Jethro process queries by first evaluating the WHERE clause and determining the rows needed for the query. It then fetched column data only for those rows. The narrower the query, the faster it performs
  2. Optimal Data Types
    1. Use numeric formats (INT/BIGINT, FLOAT/DOUBLE) whenever possible – any string column that holds only numeric values should be converted. This is especially true for high cardinality columns.
    2. Use TIMESTAMP for Date/Time columns. Jethro creates multiple date-related indexes for such columns to improve performance of date-range queries.
  3. Partitions for Large FACT Tables
    1. A TIMESTAMP column is typically best choice for partitions as it simplifies maintenance tasks like purging old data
    2. Jethro recommends a total of 5-25 partitions although it comfortably supports hundreds of partitions
    3. Jethro partition key can use range values. For example: PARTITION BY RANGE(ts_col) EVERY (INTERVAL ‘7’ DAY)  
  4. Cache Space
    1. Jethro uses server-side caching for metadata and frequently used file fragments. The greater the space the more data it will be able to store. Note that the benefit of the cache will be realized over time as filling up the cache can take some time.
    2. Cache space can be defined when an instance is created or updated later on by editing the local-conf.ini file.
    3. Jethro automatically enables query-result cache. The query-result cache is stored in HDFS and does not require local disk space.
  5. Consolidate Tables When Possible
    1.  While Jethro optimizes JOINs and automatically performs Star-Transformation, it is better to avoid them when not required.
    2. Jethro’s columnar format and effective compression minimize the storage impact of such denormalization.
  6.  Hardware considerations: more is better!
    1. More CPU and RAM Improves query speed as Jethro takes advantage of multi-threading. It also improves concurrency as more user/queries can be served in parallel.
    2. 10Gb network connectivity to cluster will speed up HDFS access.
    3. Local drives for caching – SSD is preferable.
    4. Trial servers can start with as little as 64GB and 8 cores.
  7. Use a Cluster of Jethro Servers
    1. Multiple servers linearly increase Jethro’s capacity for concurrent users and queries.
    2. When performing frequent incremental loads, it is recommended to run the JethroLoader on a different server.
  8. Data sorting can improve performance
    1. If a large number of queries filter by a specific column (that is not already a partition column) it could be beneficial to pre-sort the input data by such column before it’s loaded into Jethro.
  9. Join Indexes
    1. When attributes of large dimensions are often used as a filter it is recommended to define them as a JOIN INDEX on the fact table. There is no limit to the number of JOIN INDEXES that can be defined.
  10. Jethro without Hadoop
    1. Jethro is capable of using other storage systems besides Hadoop’s HDFS. These include a local filesystem, cloud storage (eg S3) or network storage (SAN/NAS).
    2. When the dataset used with Jethro can fit in a local filesystem it is often the best solution as it avoids Hadoop overhead.
  11. Load “Overwrite” for table update with no downtime
    1. When a dimension changes and need to be reloaded you can use Jethro’s Load Overwrite feature. It loads the updated table and only when the process is completed the tables are swapped.
  12. Use ALTER TABLE to add columns on the fly
    1. Jethro, being a column oriented design, can dynamically add (or drop) columns without having to reload the table. The value NULL will be used for the new column over existing rows
  13. Use Jethro’s “SHOW” SQL command to learn about Jethro internals

      • SHOW [SESSION | GLOBAL] PARAM parameter | ALL   (show parameter values)
      • SHOW TABLES [EXTENDED | MAINT] (show all tables, size stats, fragmanation)
      • SHOW TABLE PARTITIONS table_name (show table’s partition stats)
      • SHOW TABLE COLUMNS [FULL] table_name (show column stats)
      • SHOW VIEWS [EXTENDED] (show views)
      • SHOW LOCAL CACHE (show local file cache usage)
      • SHOW ADAPTIVE CACHE (show query result cache)
      • SHOW ACTIVE QUERIES (show currently running and queued queries)
      • SHOW SCHEMAS (show defined schemas)
      • SHOW JOIN INDEXES (show all defined JOIN indexes)

Launching Jethro for Tableau

The Jethro team is attending Tableau’s user conference for the first time this week in Vegas. It has been incredible to see the genuine sense of collaboration and community that unites that over 10,000 attendees. 

Today at the conference we are excited to announce Jethro for Tableau to empower people using Tableau with visualizations at a natural interactive pace. Instead of waiting minutes for their big data visualizations to render, Tablue users can now gain crucial big data insights in seconds.

Jethro for Tableau leverages Jethro’s index-based performance acceleration and combines it with the ease of use and visualization of Tableau. The combination will enable Tableau users to utilize Tableau’s great experience over big data tables that are simply too large to be extracted and loaded in Tableau’s memory.

The product is now available for beta testing: Download Jethro for Tableau.

Check out our live benchmarks so you can see Tableau run at interactive speed while live-connected to a 2.9B row dataset!

To acces live benchmarks:

  1. Navigate to: http://tableau.jethrodata.com
    User: demo Password: demo
  2. Choose Jethro workbook
  3. For performance comparison chose “Impala” or “Redshift” workbooks

Read more about Jethro for Tableau

Tableau Conference 2015 Las Vegas

We’re excited to exhibit at Tableau Conference 2015 in Vegas and we hope to see you there. Stop by our booth to grab some swag and chat about accelerating your queries on Tableau. We’ll be in town from October 19 – 23 and we would love to talk to you about big data performance–at the booth, over a beer or at the tables. Contact us to set up a time to meet.

Presenting at Strata + Hadoop World NYC 2015

Jethro Talks Interactive BI on Hadoop at Strata + Hadoop World NYC 2015

CTOs, data scientists, engineers and business leaders learned about the performance advantages of Jethro’s unique index-based SQL engine at the Strata + Hadoop show in NYC. The three-day convention at the Javits Center was the perfect stage for us to meet with key industry players and hear about their BI performance issues on Hadoop.

Strata Hadoop NYC 2015

Tim Antos from Jethro Demonstrating Jethro’s Blazing-fast BI Performance on Hadoop

Of particular interest were the impressive live benchmarks that showed off how fast Jethro truly is. From our discussions with both tech-guys and BI users at the show, from large financial institutions to companies that provide BI tools, we’ve witnessed a maturing of the entire Hadoop ecosystem. Companies that are using Hadoop are now understanding the limitations and performance issues surrounding BI on Hadoop.

If you missed the show, contact us to schedule a demo and see live benchmarks.

Meet Us at Hadoop Summit 2015

Attending Hadoop Summit 2015 in San Jose? We are too, and we’d love for you to come by our booth and chat about how Jethro can help your company. You can find us at booth G9, right next door to SnapLogic.

Give us your business card at the booth and participate in a raffle for an Apple Watch. Also, some lucky folks will receive our brand new “Big Data at Biblical Scale” t-shirt.

One last thing: on Tuesday, June 9, we’ll be making an exciting announcement about our company. Stay tuned…

Jethro Raises Additional $8.1 Million in Funding

I am extremely pleased to announce that we have recently raised an $8.1 million investment round. The round was led by our new partner Square Peg Capital, with participation from our existing investor, Pitango Venture Capital. Arad Naveh of Square Peg will be joining our board of directors and I am looking forward to working with him and benefiting from his vast experience as both an entrepreneur and an investor.

While this funding round is a great validation from the investor community, what really excites us at Jethro is the tremendous feedback we’ve been getting from customers who are using our product to get the most business value out of their big data. We’ll be telling some of these customer stories in the coming weeks.

This is an important milestone for the company, and as always, it is the result of the work of many. I especially want to thank my co-founders, Ronen Ovadya and Boaz Raufman, but the entire team at Jethro deserves recognition for their dedication and hard work. And we’re just getting started!

Below is the full text of the official press release.

If you haven’t already, I encourage you to download our product and see for yourself how you can achieve truly interactive BI on big data. Oh, and if you happen to be in Hadoop Summit in San Jose this week, please drop by to say hi at booth G9.

Eli Singer, CEO, Jethro

Jethro Secures $8.1 Million in Series B Funding to Advance SQL on Hadoop

Square Peg Capital leads round to accelerate go-to-market efforts

New York, NY – June 9, 2015 — Jethro, provider of the fastest SQL-on-Hadoop solution in the market, today announced that it has closed an $8.1 million Series B financing round led by Square Peg Capital, and including existing investor Pitango Venture Capital. This latest funding round will be used to increase investment in sales, marketing and development of Jethro’s unique technology.

“Eli and the amazing team at Jethro are bringing a revolutionary technology to enterprises. Hadoop users have gained great scale and cost effective storage, largely at the expense of performance. With Jethro, enterprises quickly see improved performance, making Hadoop a viable enterprise infrastructure,” said Arad Naveh, partner with Square Peg Capital. “There is a huge market opportunity for interactive BI and fast SQL-on-Hadoop. We are proud to back Jethro and the company’s breakthrough solution.”

This funding round follows closely on the heels of the general availability of Jethro’s unique SQL-on-Hadoop software, which uses indexes to query data up to 100 times faster than alternative SQL on Hadoop solutions. Jethro works with popular business intelligence (BI) solutions, including Qlik, Tableau and MicroStrategy, to enable faster access to data and allow for true interactive BI.

“Business intelligence is still hampered in the big data world by slow access to Hadoop data. Jethro solves that issue, allowing BI on big data at a speed that users currently experience with their EDW systems,” said Eli Singer, co-founder and CEO, Jethro. “Jethro is empowering enterprises to gain insights at the speed of business. With the support of Square Peg and Pitango, we’re bringing interactive BI on big data to the enterprise.”

Jethro combines search engine indexing technology with modern column store database design to create a single solution. The resulting product addresses the growing business demand for storing vast amounts of data while providing lightning-fast queries. Jethro’s breakthrough full-indexing technology enables BI users to enjoy interactive responses with big data. Jethro works seamlessly with any BI tool through a standard ODBC/JDBC interface, and is compatible with Hadoop distributions, including Cloudera, Hortonworks, MapR and Amazon.

“Our query performance on big data residing in Hadoop has improved dramatically since we started using Jethro,”said Slava Borodovsky, Senior Director of Business Intelligence at Fiverr and a Jethro customer since 2014. “This validates our initial belief that Jethro is the superior approach to other solutions we have tried. We are now implementing Jethro in our BI infrastructure to give us better insights into Fiverr users’ behavior.”

Jethro is Now in GA

I’m thrilled to announce that JethroData, the fastest SQL-on-Hadoop solution in the market, is now in GA. After more than two years in extensive development, we are proud to release this ready-for-prime-time, stable and highly performant version of the product.

You can read the official press release with some nice quotes from customers and partners.

UPDATE: Learn about new features in JethroData Version 1.0

Our CTO, Boaz Raufman, founded JethroData with a very unique vision: combine the power of search engine indexing technology with modern column store database design into a single solution. The resulting product addresses the growing business need for storing vast amounts of data while providing lightning-fast queries.

JethroData’s breakthrough full-indexing technology enables BI users to enjoy interactive responses with Big Data. JethroData works seamlessly with any BI tool — including Qlik, Tableau and Microstrategy — through a standard ODBC/JDBC interface, and is compatible with Hadoop distributions from Cloudera, Hortonworks, MapR and Amazon EMR.

This release is the result of the tireless efforts of many people here at JethroData. I especially want to thank Yuval, Michael, Ofir, Vadim, Yori, Evgeny, Gal, Aviv, Dima and Helli. And, of course, my partners and co-founders, Ronen and Boaz. .

You can see for yourself what JethroData beta customers already experience: download the product for a free trial here or view this live demo to see it in action.

If you just want to learn more: download our white paper.

P.S. In case you’re curious about our company name, JethroData, read the first blog post we ever published which explains its origin.