Big Data News from Oracle OpenWorld 2013

Only a week after Oracle OpenWorld concluded and I already feel like I’m hopelessly behind on posting news and impressions. Behind or not, I have news to share!

The most prominent feature announced at OpenWorld is the “In-Memory Option”  for Oracle Database 12c.  This option is essentially a new part of the SGA that caches tables in column formats. This is expected to make data warehouse queries significantly faster and more efficient. I would have described the feature in more details, but Jonathan Lewis gave a better overview in this forum discussion, so just go read his post.

Why am I excited about a feature that has nothing to do with Hadoop?

First, because I have a lot of experience with large data warehouses. So I know that big data often means large tables, but only few columns used in each query. And I know that in order to optimize these queries and to avoid expensive disk reads every time each query runs, we build indexes on those columns, which makes data loading slow. In-memory option will allow us to drop those indexes and just store the columns we need in memory.

Second, because I’m a huge fan of in-memory data warehouses, and am happy that Oracle is now making these feasible. Few TB of memory in a large server are no longer science fiction, which means that most of your data warehouse will soon fit in memory. Fast analytics for all! And what do you do with the data that won’t fit in memory? Perhaps store it in your Hadoop cluster.

Now that I’m done being excited about the big news, lets talk about small news that you probably didn’t notice but you should.

Oracle announced two cool new features for the Big Data Appliance. Announced may be a big word, Larry Ellison did not stand up on stage and talked about them. Instead the features sneaked quietly into the last upgrade and appeared in the documentation.

Perfect Balance – If you use Hadoop as often as I do, you know how data skew can mess with query performance. You run a job with several reducers, each aggregates data for a subset of keys. Unless you took great care in partitioning your data, the data will not be evenly distributed between the reducers, usually because it wasn’t evenly distributed between the keys. As a result, you will spend 50% of the time waiting for that one last reducer to finish already.

Oracle’s Perfect Balance makes the “took great case in partitioning your data” part much much easier. This blog post is just a quick overview, not an in-depth blog post, so I won’t go into details of how this works (wait for my next post on this topic!). I’ll just mention that Perfect Balance can be used without any change to the application code, so if you are using BDA, there is no excuse not to use it.

And no excuse to play solitaire while waiting for the last reducer to finish.

Oracle XQuery for Hadoop – Announced but not officially released yet, which is why I’m pointing you at an Amis blog post. For now thats the best source of information about this feature. This feature, combined with the existing Oracle Loader for Hadoop will allow running XQuery operations on XMLs stored in Hadoop, pushing down the entire data processing bit to Map Reduce on the Hadoop cluster. Anyone who knows how slow, painful and CPU intensive XML processing can be on an Oracle database server will appreciate this feature. I wish I had it a year ago when I had to ingest XMLs at a very high rate. It is also so cool that I’m a bit sorry that we never developed more awesome XQuery capabilities for Hive and Impala. Can’t wait for the release so I can try that!

During OpenWorld there was also additional exposure for existing, but perhaps not very well known Oracle Big Data features – Hadoop for ODI, Hadoop for OBIEE and using GoldenGate with Hadoop. I’ll try to write more about those soon.

Meanwhile, let me know what you think of In-Memory, Perfect Balance and OXH.


6 Comments on “Big Data News from Oracle OpenWorld 2013”

  1. Ofir Manor says:

    Hi Chen,
    interesting to read about Perfect Balance! I thought BDA was vanilla CDH + Oracle connectors… The docs that you linked have some fine print though. I wonder if anyone tried it with Pig and Hive to see if it makes a difference.
    Regarding In-Memory option – it could be huge for Oracle customers when it will be released (likely next year in 12c Release 2), but it may also cannibalize Exadata – less need for super-fast storage and scan rate… See my post on it:

    Oracle In-Memory Option: the good, the bad, the ugly

    • prodlife says:

      Yep, Perfect Balance was a nice surprise. I hope to play with it a bit more in the future, and will try it with Hive.

      Exadata is being sold as an everything-for-everyone machine, so I doubt anything can cannibalize it. I can see Oracle telling you that in-memory is for DWH while Exadata is for OLTP. Or that Exadata is for everything that won’t fit in memory (long-term reports and such). Or that Exadata is the best way to get enough memory to enjoy in-memory column store. Or maybe in-memory column store won’t be available without Exadata at all. Just like columnar compression. Who knows 🙂

      • Ofir Manor says:

        I’m sure Oracle will continue to push Exadata… but consider this:
        – The main DB message of OW is “buy in-memory option and have 100x faster analytics with 2x faster OLTP”.
        – If Oracle will keep pumping the message for a year, and repeat it next OW when the option is GA, some people will take it seriously.
        – If you have a DB with mixed workload, why buy both Exadata and In-Memory option? 100x + 2x is more than enough for your EBS / Siebel / what have you. Exadata was supposed to solve the analytics bottleneck, and In-Memory is even faster than I/O (both random access and sequential scans…)
        – As a bonus, with In-Memory Option, you may even run your production apps DB in a big VM in your VMware cluster – just give it 500GB and 32 vCPU, no need for propriety cluster.. Some may be tempted to try it.

        OK, last point seems a bit adventurous , but why not…

  2. > let me know what you think of In-Memory

    I’ll love it ❤
    How much does it cost? When is it available?


Leave a comment