Blog

  • Home
  • Hadoop
  • Hadoop – How To Build A Work Flow Using Oozie

Hadoop – How To Build A Work Flow Using Oozie

  • (4.0)
  • | 934 Ratings

How to Build a work flow using Oozie

Develop sample workflow using OOZIE

Build:

Capture 15 Maven is used to build the APPLICATION bundle and it is assumed that maven is installed on your path.

To build the application, simply run:

Cmd # mvn package

The maven assembles plug-in is used to generate a .tax. gz file which contains all of the workflow and configuration files in the required layout:

Oozie .examples-[VERSION]-bundle-far.gz
/workflow.xml
/con fig-de fault.xml
/conf/(job con fig)
/lib/(*.job; *.so)

Deploy:

rm- rf examples-oozie
far-xvzf oozie-examples-0.0.1-SNAPSHOT –bundle.tar.gz
Hadoop fs.rmr/work flows/oozie- examples.

Run:

Export oozie -URL=https://host name:11000/oozie
Oozie job –config oozie- examples/job. properties –run

Run parallel map-reduce jobs in sub-workflow:

Oozie job –con fig oozie - examples/job. properties
-D jump. to=parallel- run

Coordinator:

  • The Oozie Coordinator system allows you to define and execute recurrent and interdependent workflow jobs (data application pipelines)
  • A data application pipeline is a chain of coordinator workflow jobs that can run at regular intervals, different intervals or be triggered by some external event (data availability).
  • For example, the output of the last 4 runs of a workflow that runs for every 15 minutes will become the input of another workflow that runs for every 60 minutes.
  • The coordinator job bundled with this example simply runs the workflow at 5 minutes interval between the given start and end dates

To deploy the coordinator job, run the following command:

oozie job – config oozie – examples/coordinator/word. Properties
-D start=$(date-4 = “%FT%H:%MZ”)
D end=$(date-4 -d “+ 1hour”+%CT%H:%MZ”)-D mode = single –run

To stop the coordinator job run:

Oozie job – kill[word job id]

Related Page: Hadoop – How To Build A Work Flow Using Oozie

Oozie scheduler Execution using pig and Map-Reduce:

A Workflow Engine:

  • Oozie executes workflow defined as DAG of jobs.
  • The job type includes Map Reduce/pig/Hive/any script/custom Java code etc.

Workflow Engine

Oozie executes a workflow based on.

1. Time Dependency(Frequency)
2. Data Dependency

Oozie executes a workflow

Command line Tool in Oozie:

  • Oozie provides a command line utility, oozie, to perform job and admin tasks.
  • All operations are done via sub-commands of the oozie CLT
  • The oozie CLT interacts with oozie via its ws API

Commands:

To show the client version of Oozie

# oozie version

For job operations

# oozie job

For job status

# oozie job

For admin operations

# oozie admin

To validate a workflow XML file

# oozie validate

To submit a pig job everything after ‘-X’ is passed through parameters to pig.

# oozie pig -x

Oozie URL:

  • All Oozie CLI sub-commands expect the – Oozie URL option indicates the URL of the Oozie system to run the command.
  • If the – Oozie option is not specified, the Oozie CLI will look for the Oozie-URL environment variable and uses if set.
  • If the option is not provided and the environment variable is not set, the Oozie CLI will fail.

Time Zone:

  • The Time Zone-ID option in the job and jobs sub-commands allows you to specify the time zone to use in the output of those sub-commands.
  • The TIME-ZONE-ID should be one of the standard Java time zone IDs, and you can get a list of available time zones with the command oozie info-time zones.

List of Other Big Data Courses:

 Hadoop Administration  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout