Why Oozie?Posted: December 9, 2013 Filed under: Hadoop, oozie Leave a comment
Thats a really frequently asked question. Oozie is a workflow manager and scheduler. Most companies already have a workflow schedulers – Activebatch, Autosys, UC4, HP Orchestration. These workflow schedulers run jobs on all their existing databases – Oracle, Netezza, MySQL. Why does Hadoop need its own special workflow scheduler?
As usual, it depends. In general, you can keep using any workflow scheduler that works for you. No need to change, really.
However, Oozie does have some benefits that are worth considering:
- Oozie is designed to scale in a Hadoop cluster. Each job will be launched from a different datanode. This means that the workflow load will be balanced and no single machine will become overburdened by launching workflows. This also means that the capacity to launch workflows will grow as the cluster grows.
- Oozie is well integrated with Hadoop security. This is especially important in a kerberized cluster. Oozie knows which user submitted the job and will launch all actions as that user, with the proper privileges. It will handle all the authentication details for the user as well.
- Oozie is the only workflow manager with built-in Hadoop actions, making workflow development, maintenance and troubleshooting easier.
- Oozie UI makes it easier to drill down to specific errors in the data nodes. Other systems would require significantly more work to correlate jobtracker jobs with the workflow actions.
- Oozie is proven to scale in some of the world’s largest clusters. The white paper discusses a deployment at Yahoo! that can handle 1250 job submissions a minute.
- Oozie gets callbacks from MapReduce jobs so it knows when they finish and whether they hang without expensive polling. No other workflow manager can do this.
- Oozie Coordinator allows triggering actions when files arrive at HDFS. This will be challenging to implement anywhere else.
- Oozie is supported by Hadoop vendors. If there is ever an issue with how the workflow manager integrates with Hadoop – you can turn to the people who wrote the code for answers.
So, should you use Oozie? If you find these benefits compelling, then yes. Step out of your comfort zone and learn another new tool. It will be worth it.