You may have heard some of the recent news about “big data” and Hadoop is the big player here. Oracle was talking about their new data bases that was “SQL-less” and Apache Hadoop is making it’s way around. Dr. Halamka recently gave it a nice thumbs up too when talking about mining big data.
When you start to think about all the data we are warehousing and searching today, something has to give and Hadoop has been around for a while and you can read more about the technology here. It’s all that nice Java framework that we all pretty much like to work with. The framework is used by IBM and is especially a favorite with search engines. The best part is that it is open source over at Apache. Microsoft recently announced their SQL Hadoop integration that is forthcoming. The Cleveland Clinic help found a company in 2009 Explorys that uses the Hadoop framework, so it’s not long before Hadoop will move in with big force. The Hadoop World Conference just ended and you can view some of the videos from the companies who will be working to sell to the enterprise clients.
You have to love the names here as we continue on with Apache “Pig” which is the platform for analyzing large data sets, so now you have to develop or code “Pig”. For developers, Apache Pig9.1 has been released and yes the name according to the site was named after “Pig Latin” so we have a new meaning for the word now. If you read below you can see where JPMorgan has gone to the pigs…well maybe in more ways than one, so pigs are going to become pretty popular on Wall Street with the Java Algorithms needed. Maybe we will have IT staffs we can now call High Frequency Pig Experts:) Don’t laugh as that’s a real thing coming and we can honestly say that banking software/hardware is all about the Pigs.
There’s also the HBase platform for reading/writing to Big Data sets. I’m sitting here looking at the comparison of adding a data table to a relational data base and that can take a ton of time…and with Hadoop, it’s like nothing by comparison. This should also tell you how much cheaper support time will be and the data is split among clusters and it finds redundancies or duplicate copies of data, and that I like a lot.
It can be as big or small as you want it from one single server to hundreds or even thousands of servers, so you have to stop and think…medical records…is this the next plateau? One thing for sure it’s affordable but you will need the support and folks are now just gearing up for those jobs. BD
Those were the first words from Larry Feinsmith, managing director, office of the CIO, at JPMorgan Chase, in his Tuesday keynote address at Hadoop World in New York. Who JPMorgan Chase is hiring, specifically, are people with Hadoop skills, so Feinsmith was in the right place. More than 1,400 people were in the audience, and attendee polls indicated that at least three quarters of their organizations are already using Hadoop, the open source big data platform.
The "and we're paying 10% more" bit was actually Feinsmith's ad-libbed follow-on to the previous keynoter, Hugh Williams, VP of search, experience, and platforms at eBay. After explaining eBay's Hadoop-based Cassini search engine project, Williams said his company is hiring Hadoop experts to help build out and run the tool.
Feinsmith's core message was that Hadoop is hugely promising, maturing quickly, and might overlap the functionality of relational databases over the next three years. In fact, Hadoop World 2011 was a coming-out party of sorts, as it's now clear that Hadoop will matter to more than just Web 2.0 companies like eBay, Facebook, Yahoo, AOL, and Twitter. A straight-laced financial giant with more than 245,000 employees, 24 million checking accounts, 5,500 branches, and 145 million credit cards in use, JPMorgan Chase lends huge credibility to that vision.
Five of JP Morgan Chase's seven lines of business now use a Hadoop shared service. They use it for extract, transform, and load (ETL) processing; high-scale Basel III regulatory liquidity analyses and reporting; data mining; transaction analysis; fraud investigation; and social media sentiment analysis. It's also a low-cost storage option for all types of data, including structured financial records, semi-structured clickstreams and Web logs, and unstructured text and social comment feeds.
The platform is headed for broad adoption, so it's a sound career path, much like SQL was 30 years ago. Want a more substantial endorsement? Consider that IBM, Microsoft, and Oracle--multibillion-dollar vendors with substantial data management software revenue at stake--have all embraced Hadoop this year.