Hadoop – How to Build a work flow using Oozie

Develop sample work flow using oozie:-


Capture 15 Maven is used to build the application bundle and it is assumed that maven is installed on your path.

Capture 15 To build the application, simply run:

Cmd # mvn package

Capture 15 The maven assemble plug-in is used to generate a .tax. gz file which contains all of the work flow and configuration files in the required layout:

Oozie .examples-[VERSION]-bundle-far.gz
/con fig-de fault.xml
/conf/(job con fig)
/lib/(*.job; *.so)


rm- rf examples-oozie
far-xvzf oozie-examples-0.0.1-SNAPSHOT –bundle.tar.gz
Hadoop fs.rmr/work flows/oozie- examples.


Export oozie -URL=http://host name:11000/oozie
Oozie job –config oozie- examples/job. properties –run

Run parallel map reduce jobs in sub- work flow:-

Oozie job –con fig oozie - examples/job. properties
-D jump. to=parallel- run


Capture 15 The Oozie Coordinator system allows you to define and execute recurrent and inter dependent work flow jobs (data application pipelines)

Capture 15 A data application pipeline is a chain of coordinator work flow jobs that can run at regular intervals, different intervals or be triggered by some external event (data availability).

Capture 15 For example, the output of the last 4 runs of a work flow that runs for every 15 minutes will become the input of another work flow that runs for every 60 minutes.

Capture 15 The coordinator job bundled with this example simply runs the work flow at 5 minutes interval between the given start and end dates

Capture 15 To deploy the coordinator job, run the following command:

oozie job – config oozie – examples/coordinator/word. Properties
-D start=$(date-4 = “%FT%H:%MZ”)
D end=$(date-4 -d “+ 1hour”+%CT%H:%MZ”)-D mode = single –run

Capture 15 To stop the coordinator job run:

Oozie job – kill[word job id]

Oozie scheduler Execution using pig and Map Reduce:-

A Work flow Engine:-

 Capture 15 Oozie executes work flow defined as DAG of jobs.

Capture 15 The job type includes: Map Reduce/pig/Hive/any script/custom Java code etc.


Oozie executes work flow based on.

  • Time Dependency(Frequency)
  • Data Dependency


Command line Tool in Oozie:-

Capture 15 Oozie provides a command line utility, oozie, to perform job and admin tasks.

Capture 15 All operations are done via sub-commands of the oozie CLT

Capture 15 The oozie CLT interacts with oozie via its ws API


Capture 15 To show the client version of Oozie

               Screenshot_1821   Cmd # oozie version

Capture 15 For job operations

# oozie job<options>

Capture 15 For job status

# oozie job<options>

Capture 15 For admin operations

# oozie admin

Capture 15 To validate a work flow XML file

                        # oozie validate<ARGS>

Capture 15 To submit a pig job every thing after ‘-X’ are passed through parameters to pig.

# oozie pig <options>-x<ARGS>

Oozie URL:-

Capture 15 All Oozie CLI sub-commands expect the – Oozie URL option indicate the URL of the Oozie system to run the command.

Capture 15 If the – Oozie option is not specified, the Oozie CLI will look for the Oozie-URL environment variable and uses if set.

Capture 15 If the option is not provided and the environment variable is not set, the Oozie CLI will fail.

Time Zone:-

Capture 15 The Time Zone-ID option in the job and jobs sub-commands allows you to specify the time zone to use in the output of those sub-commands.

Capture 15 The TIME-ZONE-ID should be one of the standard Java time zone IDs, and you can get list of available time zones with the command oozie info-time zones.


0 Responses on Hadoop – How to Build a work flow using Oozie"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.