Enterprise IT Context for the CTO

Bob Gourley

Subscribe to Bob Gourley: eMailAlertsEmail Alerts
Get Bob Gourley: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Enterprise IT: Moving CapEx to OpEx , Big Data on Ulitzer

Blog Feed Post

Hadoop Big Data Reservoirs

A look at the Hadoop Big Data Reservoir with Platfora’s Peter Schlampp

By - Peter Schlampp, the Vice President of Products and Business Development at Platfora, explains what the Hadoop Big Data reservoir is and is not in this webinar that I watched today. Knowing what the HDR is and is not is key to pulling out business intelligence insights and analytics. Platfora arrived at these conclusions from interviews of over 200 enteprise IT professionals who are working in the big data space. They identified key requirements of self-service access, high performance and security.

The Hadoop Data Reservoir is the central Hadoop cluster for the entire enterprise. It provides storage and the source for business analytics. It also allows processing for data preparation and advanced analytics. Peter and Platfora believe that the HDR eliminates data silos, reduces costs, and makes business analytics agile. He does not see HDR as the replacement for enterprise data warehouse, which require major planning and continuous attention from a large staff. Using an HDR allows it to be both self-service and flexible, while catching data that may have been dropped previously.

Meeting the Performance Requirement

Queries must be consistently fast – BI applications are driving increasing number of queries. The actions of one user, should not impact the performance of many. Users expect sub-second responses, and when they do not get them, they think something is wrong. To this extent, in their research, Platfora has found that most queries are straightforward and big. Platfora has solved the “big” part by creating pre-calculated summary tables, which summarize fine data. This reduces the amount of data needed to be queried to answer a question, and limits redundant processing. By storing these tables in-memory, sub-second response times are possible.

Meeting the Self-Service Requirement

One of the key ways that self-service must be met is making aggregate table creation automatic. This means that instead of having a human try to create them (and refine them over time) they have to be created by the system. As well, they must be maintained, tracking new additions to the data set. Sometimes, the aggregate tables are not good enough, and you must provide the ability to drill down through the table into the raw data. Often, users will need to ingest their own data sets, which allows them to augment aggregated tables.

Meeting the Security Requirement

While there is some built-in security in the Hadoop File System, it focuses mainly on file and directory based permissions, as well as secure authentication. This does not meet most enterprise security needs, creating a vacuum in the security paradigm. Platfora’s Hadoop Big Data Reservoir provides that granular security enterprises need, without any performance hit.

The Platfora Integrated Platform

Platfora’s Integrated Platform is three tiers of services into one unified package. The Web-based Business Intelligence Application is a self-service, light application that uses HTML5 to create a rich visual experience for the analyst. The Scale-out, In-Memory, Data Mart and Processing Engine enable rapid access of those aggregate stores, while maintaining the ability to grow to scale (and be used at scale). Lastly, the Automated Hadoop Refinery is constantly creating new aggregate tables, refining the data sets and maintaining the data. It requires little maintenance from IT workers, and provides a great deal of reliability.

The Demo

Peter launched into a demonstration which showed how the ingest of a large sets of data can help BI rapidly. This data includes information such as LDAP access and more. Peter demonstrated how to create unique “mount points,” for which he could immediately set that access rules – increasing the security around sets of data. He also described how creating a “Lens,” an in-memory aggregate table, can be easy and done directly by any analyst. The analyst can choose the fields within the raw data to focus the table around, as well as set security rules. Platfora’s visualization capability is called Vizboards. With it, users can choose data points to try to visualize why things have happened, according to the data. These Vizboards can then be shared with users as you see fit.

This is a great webinar, not just to see what Platfora is doing, but rather more about Hadoop in general. Hadoop Big Data Reservoirs can be used to catch data before it trickles out into the ether, and is no longer usable. Be sure to check it out if you’re looking to create functional and usable BI for your agency.

More Stories By Bob Gourley

Bob Gourley writes on enterprise IT. He is a founder and partner at Cognitio Corp and publsher of CTOvision.com

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.