Hadoop – How To Build A Work Flow Using Oozie
Develop sample workflow using OOZIE
Capture 15 Maven is used to build the APPLICATION bundle and it is assumed that maven is installed on your path.
To build the application, simply run:
Cmd # mvn package
The maven assembles plug-in is used to generate a .tax. gz file which contains all of the workflow and configuration files in the required layout:
/con fig-de fault.xml
/conf/(job con fig)
rm- rf examples-oozie
far-xvzf oozie-examples-0.0.1-SNAPSHOT –bundle.tar.gz
Hadoop fs.rmr/work flows/oozie- examples.
Export oozie -URL=https://host name:11000/oozie
Oozie job –config oozie- examples/job. properties –run
Run parallel map-reduce jobs in sub-workflow:
Oozie job –con fig oozie - examples/job. properties
-D jump. to=parallel- run
- The Oozie Coordinator system allows you to define and execute recurrent and interdependent workflow jobs (data application pipelines)
- A data application pipeline is a chain of coordinator workflow jobs that can run at regular intervals, different intervals or be triggered by some external event (data availability).
- For example, the output of the last 4 runs of a workflow that runs for every 15 minutes will become the input of another workflow that runs for every 60 minutes.
- The coordinator job bundled with this example simply runs the workflow at 5 minutes interval between the given start and end dates
To deploy the coordinator job, run the following command:
oozie job – config oozie – examples/coordinator/word. Properties
-D start=$(date-4 = “%FT%H:%MZ”)
D end=$(date-4 -d “+ 1hour”+%CT%H:%MZ”)-D mode = single –run
To stop the coordinator job run:
Oozie job – kill[word job id]
Related Page: Hadoop – How To Build A Work Flow Using Oozie
Oozie scheduler Execution using pig and Map-Reduce:
A Workflow Engine:
- Oozie executes workflow defined as DAG of jobs.
- The job type includes Map Reduce/pig/Hive/any script/custom Java code etc.
Oozie executes a workflow based on.
1. Time Dependency(Frequency)
2. Data Dependency
Command line Tool in Oozie:
- Oozie provides a command line utility, oozie, to perform job and admin tasks.
- All operations are done via sub-commands of the oozie CLT
- The oozie CLT interacts with oozie via its ws API
To show the client version of Oozie
For job operations
For job status
For admin operations
To validate a workflow XML file
To submit a pig job everything after ‘-X’ is passed through parameters to pig.
- All Oozie CLI sub-commands expect the – Oozie URL option indicates the URL of the Oozie system to run the command.
- If the – Oozie option is not specified, the Oozie CLI will look for the Oozie-URL environment variable and uses if set.
- If the option is not provided and the environment variable is not set, the Oozie CLI will fail.
- The Time Zone-ID option in the job and jobs sub-commands allows you to specify the time zone to use in the output of those sub-commands.
- The TIME-ZONE-ID should be one of the standard Java time zone IDs, and you can get a list of available time zones with the command oozie info-time zones.
List of Other Big Data Courses: