Controlling Hadoop Jobs using Oozie


Capture 15 OOZIE is a server based work flow engine specialized in running work flow jobs with actions that run Hadoop MapReduce and Pig jobs.

Capture 15 OOZIE is a Java web–application that runs in a Java servlet container.

Capture 15 For the purposes Of oozie, a work flow is a collection of action i.e hadoop mapreduce jobs, pig jobs are arranged in a control dependency DAG (Direct Acyclic Graph)

Capture 15 Control dependency from one action to another means that the second action can’t run until the first action has completed.

Capture 15 Oozie work flow actions start jobs in remote systems

Capture 15 Upon action completion, the remote systems call back oozie to notify the action completion, at this point, oozie proceeds to the next action in the work flow.

Capture 15 Oozie work flows contain control flow nodes and action nodes

Capture 15 Control flow nodes define the beginning and the end of a work flow (start, end and fail nodes) and provide a mechanism to control the work flow execution path (decision, fork and join nodes)

Capture 15 Action nodes are the mechanism by which a work flow triggers the execution of computation or processing task.

Capture 15 Oozie provides support for different types of actions such as Hadoop map-reduce, Hadoop file system, pig, SSH, HTIP, email and oozie sub-work flow.

Capture 15 Oozie work flows can be parameterized using variables like (input dir) within the work flow definition.

Capture 15 When submitting a work flow job values, the parameters must be provided

Work flow Diagram(Word count work flow example):-


Installation and configuring oozie:-

Capture 15 To install and run oozie using an embedded to meat server and an embedded Derby database

System Requirements:-

Unix(tested in linux and Max osk)

Java 1.6 +

Hadoop 0.20 and 1.0.0

Ext Js Library (optional, to enable oozie web console)

                 Screenshot_1821    ExtJS 2.2

Note:-Java 1.6+bin directory should be in the command path

Server Installation:-

Capture 15 oozie ignores any set value for oozie- HOME, and … oozie computes its home automatically

Capture 15 Download or build an oozie binary distribution

Capture 15 Download a hadoop binary distribution

Capture 15 Download ExtJS library[version 2.2]


ExtJS library is not bundled with oozie, because it uses a different license and recommended to use a oozie Unix user for one oozie server.

Capture 15 Expand the oozie distribution tar.gz.

Capture 15 Expand the hadoop distribution far.gz(as the oozie Unix user)

Capture 15 Oozie is bundled without hadoop JAR file and ExtJS library

Capture 15 Hadoop JARs are requited to run oozie.

Capture 15 The ExtJS library is only  required for the oozie web –console to work

Capture 15 All oozie server scripts i.e.

oozie serup.sh

oozie start.sh                        all run only under the

oozie run.sh                           Unix user that owns the

oozie stop.sh                          oozie installation directory

Capture 15 Use the oozie- setup.sh script to add the Hadoop JARS and ExtJS Library to oozie.

     Screenshot_1821  $bin/oozie – setup.sh-hadoop 0.20.200

                ${ hadoop-Home}-ExtJS hmp/ext-2.2.zip

Capture 15 To start oozie as a daemon process run the command as

$bin/oozie - start.sh

Capture 15 To start oozie as a foreground process, run the command as

$bin/oozie - run.sh

Capture 15 Check the oozie.log file logs/ oozie.log to ensure. Oozie. started properly.

Capture 15 To Check the status of oozie using the Oozie command like rool is

$bin/oozie admin- oozie http://localhost:11000/ oozie-status

Capture 15 oozie status should be normal when using the oozie web console

Client Installation:-

Capture 15 Copy and expand the oozie- client TAR.GZ bundled with the distribution.

Capture 15 Add the bin/direction to the PATH

Note: The oozie server installation includes the oozie client which should be installed in remote machines only.

Oozie share lib installation:-

Capture 15 Expand the oozie- sharelib TAR.COZ .file bundled with the distribution

Capture 15 Shared directory must be copied to the oozie HOME directory in HDFS:

$hadoop fs-put share


This must be done using the oozie hadoop (HDFS) and if shared directory already exists in HDRS, it must be deleted before copying  again.


0 Responses on Controlling Hadoop Jobs using Oozie"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.