How To Controlling & Scheduling Hadoop Jobs Using Oozie?

Hadoop Articles

Hadoop Quiz

Test and Explore your knowledge

OOZIE:

OOZIE is a server-based workflow engine specialized in running workflow jobs with actions that run HADOOP MapReduce and PIG jobs.
OOZIE is a Java web–application that runs in a Java servlet container.
For the purposes Of Oozie, a workflow is a collection of action i.e Hadoop MapReduce jobs, pig jobs are arranged in a control dependency DAG (Direct Acyclic Graph)
Control dependency from one action to another means that the second action can’t run until the first action has completed.
Oozie workflow actions start jobs in remote systems
Upon action completion, the remote systems call back Oozie to notify the action completion, at this point, Oozie proceeds to the next action in the workflow.
Oozie workflows contain control flow nodes and action nodes

Get ahead in your career by learning Hadoop through Mindmajix Hadoop Training.

Control flow nodes define the beginning and the end of a workflow (start, end and fail nodes) and provide a mechanism to control the workflow execution path (decision, fork, and join nodes)
Action nodes are the mechanism by which a workflow triggers the execution of computation or processing tasks.
Oozie provides support for different types of actions such as Hadoop map-reduce, Hadoop file system, pig, SSH, HTIP, email, and Oozie sub-workflow.
Oozie workflows can be parameterized using variables like (input dir) within the workflow definition.
When submitting a workflow job values, the parameters must be provided

Workflow Diagram(Word count workflow example):

Installation and configuring Oozie:

To install and run Oozie using an embedded to meat server and an embedded Derby database

System Requirements:

Unix(tested in linux and Max osk)

Java 1.6 +

Hadoop 0.20 and 1.0.0

Ext Js Library (optional, to enable oozie web console)

ExtJS 2.2

Note: Java 1.6+bin directory should be in the command path

Server Installation:

oozie ignores any set value for oozie- HOME, and … oozie computes its home automatically
Download or build an oozie binary distribution
Download a Hadoop binary distribution
Download ExtJS library[version 2.2]

Note:

ExtJS library is not bundled with oozie, because it uses a different license and recommended to use a oozie Unix user for one oozie server.

Frequently asked Hadoop Interview Questions

Expand the oozie distribution tar.gz.
Expand the Hadoop distribution far.gz(as the oozie Unix user)
Oozie is bundled without Hadoop JAR file and ExtJS library
Hadoop JARs are requited to run oozie.
The ExtJS library is only required for the oozie web –console to work
All oozie server scripts i.e.

oozie serup.sh

oozie start.sh all run only under the

oozie run.sh Unix user that owns the

oozie stop.sh oozie installation directory

Use the oozie- setup.sh script to add the Hadoop JARS and ExtJS Library to Oozie.

$bin/oozie – setup.sh-hadoop 0.20.200

${ hadoop-Home}-ExtJS hmp/ext-2.2.zip

To start oozie as a daemon process run the command as

$bin/oozie - start.sh

To start oozie as a foreground process, run the command as

$bin/oozie - run.sh

Check the oozie.log file logs/ oozie.log to ensure. Oozie. started properly.

To Check the status of Oozie using the Oozie command like tool is

$bin/oozie admin- oozie HTTP://LOCALhost:11000/ oozie-status

Oozie status should be normal when using the Oozie web console

Client Installation:

Copy and expand the Oozie- client TAR.GZ bundled with the distribution.

5 Add the bin/direction to the PATH

Note: The Oozie server installation includes the Oozie client which should be installed in remote machines only.

Oozie share lib installation:

Expand the Oozie- sharelib TAR.COZ .file bundled with the distribution

Shared directory must be copied to the Oozie HOME directory in HDFS:

$hadoop fs-put share

Note:

This must be done using the OozieHadoopp (HDFS) and if shared directory already exists in HDRS, it must be deleted before copying again.

Explore Hadoop Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Big Data Courses:

Hadoop Administration	MapReduce
Big Data On AWS	Informatica Big Data Integration
Bigdata Greenplum DBA	Informatica Big Data Edition
Hadoop Hive	Impala
Hadoop Testing	Apache Mahout

On-Job Support Service

Online Work Support for your on-job roles.

@Learner@SME

Our work-support plans provide precise options as per your project tasks. Whether you are a newbie or an experienced professional seeking assistance in completing project tasks, we are here with the following plans to meet your custom needs:

Pay Per Hour
Pay Per Week
Monthly

Learn MoreContact us

Course Schedule

Name	Dates
Hadoop Training	Nov 29 to Dec 14	View Details
Hadoop Training	Dec 02 to Dec 17	View Details
Hadoop Training	Dec 06 to Dec 21	View Details
Hadoop Training	Dec 09 to Dec 24	View Details

Last updated: 02 May 2023

About Author

Ravindra Savaram

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less

Recommended Courses