Home / Hadoop

Controlling Hadoop Jobs Using Oozie

Rating: 4
Views: 3190
by Ravindra Savaram
Last modified: October 8th 2020


  • OOZIE is a server-based workflow engine specialized in running workflow jobs with actions that run HADOOP MapReduce and PIG jobs.
  • OOZIE is a Java web–application that runs in a Java servlet container.
  • For the purposes Of Oozie, a workflow is a collection of action i.e Hadoop MapReduce jobs, pig jobs are arranged in a control dependency DAG (Direct Acyclic Graph)
  • Control dependency from one action to another means that the second action can’t run until the first action has completed.
  • Oozie workflow actions start jobs in remote systems
  • Upon action completion, the remote systems call back Oozie to notify the action completion, at this point, Oozie proceeds to the next action in the workflow.
  • Oozie workflows contain control flow nodes and action nodes
Get ahead in your career by learning Hadoop through Mindmajix  Hadoop Training.
  • Control flow nodes define the beginning and the end of a workflow (start, end and fail nodes) and provide a mechanism to control the workflow execution path (decision, fork, and join nodes)
  • Action nodes are the mechanism by which a workflow triggers the execution of computation or processing tasks.
  • Oozie provides support for different types of actions such as Hadoop map-reduce, Hadoop file system, pig, SSH, HTIP, email, and Oozie sub-workflow.
  • Oozie workflows can be parameterized using variables like (input dir) within the workflow definition.
  • When submitting a workflow job values, the parameters must be provided

Workflow Diagram(Word count workflow example):

Installation and configuring Oozie:

To install and run Oozie using an embedded to meat server and an embedded Derby database

System Requirements:

Unix(tested in linux and Max osk)

Java 1.6 +

Hadoop 0.20 and 1.0.0

Ext Js Library (optional, to enable oozie web console)

              Arrow    ExtJS 2.2

Note: Java 1.6+bin directory should be in the command path

Server Installation:

  •  oozie ignores any set value for oozie- HOME, and … oozie computes its home automatically
  •  Download or build an oozie binary distribution
  •  Download a Hadoop binary distribution
  •  Download ExtJS library[version 2.2]


ExtJS library is not bundled with oozie, because it uses a different license and recommended to use a oozie Unix user for one oozie server.

Frequently asked Hadoop Interview Questions

  • Expand the oozie distribution tar.gz.
  • Expand the Hadoop distribution far.gz(as the oozie Unix user)
  • Oozie is bundled without Hadoop JAR file and ExtJS library
  • Hadoop JARs are requited to run oozie.
  • The ExtJS library is only  required for the oozie web –console to work
  • All oozie server scripts i.e.

oozie serup.sh

oozie start.sh                        all run only under the

oozie run.sh                           Unix user that owns the

oozie stop.sh                          oozie installation directory

Use the oozie- setup.sh script to add the Hadoop JARS and ExtJS Library to Oozie.

$bin/oozie – setup.sh-hadoop 0.20.200

${ hadoop-Home}-ExtJS hmp/ext-2.2.zip

To start oozie as a daemon process run the command as

$bin/oozie - start.sh

To start oozie as a foreground process, run the command as

$bin/oozie - run.sh

Check the oozie.log file logs/ oozie.log to ensure. Oozie. started properly.

To Check the status of Oozie using the Oozie command like tool is

$bin/oozie admin- oozie HTTP://LOCALhost:11000/ oozie-status

Oozie status should be normal when using the Oozie web console

Client Installation:

Copy and expand the Oozie- client TAR.GZ bundled with the distribution.

5 Add the bin/direction to the PATH

Note: The Oozie server installation includes the Oozie client which should be installed in remote machines only.

Oozie share lib installation:

Expand the Oozie- sharelib TAR.COZ .file bundled with the distribution

Shared directory must be copied to the Oozie HOME directory in HDFS:

$hadoop fs-put share


This must be done using the OozieHadoopp (HDFS) and if shared directory already exists in HDRS, it must be deleted before copying again.

Explore Hadoop Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Big Data Courses:

 Hadoop Administration  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout