Parameterizing Hive Actions in Oozie Workflows

Very common request I get from my customers is to parameterize the query executed by a Hive action in their Oozie workflow.
For example, the dates used in the query depend on a result of a previous action. Or maybe they depend on something completely external to the system – the operator just decides to run the workflow on specific dates.

There are many ways to do this, including using EL expressions, capturing output from shell action or java action.
Here’s an example of how to pass the parameters through the command line. This assumes that whoever triggers the workflow (Human or an external system) has the correct value and just needs to pass it to the workflow so it will be used by the query.

Here’s what the query looks like:

insert into test select * from test2 where dt=${MYDATE}

MYDATE is the parameter that allows me to run the query on a different date each time. When running this query in hive, I’d use something like “set MYDATE=’2011-10-10′” before running the query. But when I run it from Oozie, I need to pass the parameter to the Hive action.

Lets assume I saved the query in a file hive1.hql. Here’s what the Oozie workflow would look like:

<workflow-app name="cmd-param-demo" xmlns="uri:oozie:workflow:0.4">
	<start to="hive-demo"/>
	<action name="hive-demo">
		<hive xmlns="uri:oozie:hive-action:0.2">
		<ok to="end"/>
		<error to="kill"/>
	<kill name="kill">
		<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
	<end name="end"/>

The important line is “MYDATE=${MYDATE}”. Here I translate an Oozie parameter to a parameter that will be used by the Hive script. Don’t forget to copy hive-site.xml and hive1.hql to HDFS! Oozie actions can run on any datanode and will not read files on local file system.

And here’s how you call Oozie with the commandline parameter:
oozie job -oozie http://myserver:11000/oozie -config ~/workflow/ -run -verbose -DMYDATE=’2013-11-15′

Thats it!