Core HadoopOne of the most common, and frankly naïve, ways people approach BI with Hadoop, which generally ends in failure, is using core Hadoop to power BI reporting with tools such as Tableau, SSRS, or some other third party system like Business Objects, etc. When we talk about core Hadoop we are referring to the following services:
- MapReduce/MR2 YARN
- Hive/HiveServer 2
Retro SolutioningRetro Solutioning, which may be a coined term (not sure to be honest), is when you don’t know enough about the other Hadoop services outside of the core features and decide to stick with what’s “safe.” The resulting solution will utilize technologies your team has at hand and is comfortable with and is usually the second line of attack when initial plans don’t quite work out (see the previous section). Although there are valid situations that may call for utilizing existing technology and every company and situation is unique, more often than not it proves to be additional work and, by nature, additional processes for you to manage. Not only does this completely negate the purpose of Hadoop and big data but it wastes the time and resources of your team and has a high likelihood of becoming a band-aide solution instead of a scalable one. Being a Microsoft SQL developer for 10+ years prior to getting into big data I felt this example, derived from my own experience, was very relevant to this topic and involves utilizing Microsoft SQL Server and the Tabular Analysis Services features to make up for less than desired performance out of Hive. We’ll start by outlining the hypothetical project and then go into the implemented solution and the results afterwards. As you read through it, think of how you’d approach this problem and if it lines up or not (if it doesn’t line up then you may have did it right!)
The “Hypothetical” ProjectProject: Reporting Dashboards Original Systems: Microsoft SQL Server (DB) Microsoft .Net for dashboards – yikes, I know Microsoft Reporting Services Project Overview: Take existing reports that have an MS SQL Server database source and upgrade them to Tableau, deprecating any use of .NET as a dashboarding interface and removing any reporting dependency on MS Reporting Services. Project Requirements:
- All reports should load in <10 seconds on initial browse
- All reports should respond to filter changes within <10 seconds
- Data source should hold at least 1 year for users to pull in active reports/dashboards
- Solution should be able to handle a user Base = 1000 users with 200 being concurrent
The PitfallsWe learned how to connect different systems to HS2; but we also learned that relying on it to feed a performance driven solution was not going to work without doing one of the following:
- More hardware – faster and more expensive SSAS machines would certainly help – but where is the budget for that?
- Compromise on the solution. Reduce the amount of data kept and displayed assuming the customer would allow that – probably not
The Best of Both WorldsHow do we get the best of both worlds? That is the question of the day, what magical solution will allow us to get the benefits of Hadoop and power your BI while, at the same time, avoiding costly add-on tools that mimic some familiar functionality. Sticking with your current BI tool, whether that is Tableau or Qlik is the right start but you also need to identify what your overall direction is with BI and reporting within your organization. Oftentimes we jump into tools that have punch lines such as “bring Hadoop to your analysts” or “data discovery at your fingertips” and while these tools may be great at specific tasks, they ultimately fall short with BI. Keeping in mind there may never be a “catch-all” solution, and why would there be – this is Hadoop and it’s an ecosystem, let’s break it down by some BI needs…
Power User PandemoniumA situation derived from personal experiences, having an abnormal amount of power users who have decent-to-good skills in SQL and databases and who really just want to get the data and manipulate it for their own reports. This BI nightmare, or dream depending on how you look at it, often includes many ad-hoc queries tapping your source system and the use of easily accessible analyst tools like Microsoft Excel and Access to pull data in and do what is needed with it.
Report StoresHaving a bunch of reports that are maintained by a development team and utilized by analysts but are relatively trivial when inspected independently. We all have instances of this, these “quick-hit” reports that were spun up to answer 20-30 users constant inquiries and they never went away.
DashboardsThe traditional dashboard requests that serve to answer many questions from executive levels and onward (depending on your organization). Dashboard development is the best part of BI, in my opinion, because it encompasses all the challenges associate with BI (learning the business, data, and dashboarding tool use).
What’s the Answer?Easy…There isn’t one answer to handle it all as I eluded to in the previous sections. But if you approach your BI challenges knowing that you are starting down the right path because you are coming in with an open mind to the different technologies and combinations of them that will help you when solutioning. For example, one of my favorite approaches to solving the BI needs we listed above (realizing there are more than those three) is using this stack:
- Core Hadoop
Having worked as a database architect for 10+ years, beginning with Microsoft SQL Server 2000+, Analysis Services, and ultimately Big Data Hadoop, AJ Adams has come across many different challenges. He has worked with many different types of organizations covering a variety of industries including energy, government, healthcare, and sales/marketing; designing and implementing architecture for BI-related solutions using tools such as Tableau, Qlik, and others. With the growth of the size of data he moved into Hadoop, primarily Cloudera and services associated with the distribution specializing in administration, development, and overall stack setup. He believes one of the keys to success with making the move to big data is to approach it openly and focus on what sets of services can best handle the various needs versus battling to come up with a ‘one-solution-fits-all’ design.