Develop sample workflow using OOZIE
Build:
Capture 15 Maven is used to build the APPLICATION bundle and it is assumed that maven is installed on your path.
To build the application, simply run:
Cmd # mvn package
The maven assembles plug-in is used to generate a .tax. gz file which contains all of the workflow and configuration files in the required layout:
Oozie .examples-[VERSION]-bundle-far.gz /workflow.xml /con fig-de fault.xml /conf/(job con fig) /lib/(*.job; *.so)
Deploy:
rm- rf examples-oozie far-xvzf oozie-examples-0.0.1-SNAPSHOT –bundle.tar.gz Hadoop fs.rmr/work flows/oozie- examples.
Run:
Export oozie -URL=https://host name:11000/oozie Oozie job –config oozie- examples/job. properties –run
Run parallel map-reduce jobs in sub-workflow:
Oozie job –con fig oozie - examples/job. properties -D jump. to=parallel- run
Coordinator:
- The Oozie Coordinator system allows you to define and execute recurrent and interdependent workflow jobs (data application pipelines)
- A data application pipeline is a chain of coordinator workflow jobs that can run at regular intervals, different intervals or be triggered by some external event (data availability).
- For example, the output of the last 4 runs of a workflow that runs for every 15 minutes will become the input of another workflow that runs for every 60 minutes.
- The coordinator job bundled with this example simply runs the workflow at 5 minutes interval between the given start and end dates
To deploy the coordinator job, run the following command:
oozie job – config oozie – examples/coordinator/word. Properties -D start=$(date-4 = “%FT%H:%MZ”) D end=$(date-4 -d “+ 1hour”+%CT%H:%MZ”)-D mode = single –run
To stop the coordinator job run:
Oozie job – kill[word job id]
Related Page: Hadoop – How To Build A Work Flow Using Oozie
Oozie scheduler Execution using pig and Map-Reduce:
A Workflow Engine:
- Oozie executes workflow defined as DAG of jobs.
- The job type includes Map Reduce/pig/Hive/any script/custom Java code etc.
Subscribe to our youtube channel to get new updates..!
Oozie executes a workflow based on.
1. Time Dependency(Frequency)
2. Data Dependency
Command line Tool in Oozie:
- Oozie provides a command line utility, oozie, to perform job and admin tasks.
- All operations are done via sub-commands of the oozie CLT
- The oozie CLT interacts with oozie via its ws API
Commands:
To show the client version of Oozie
# oozie version
For job operations
# oozie job
For job status
# oozie job
For admin operations
# oozie admin
To validate a workflow XML file
# oozie validate
To submit a pig job everything after ‘-X’ is passed through parameters to pig.
# oozie pig -x
Oozie URL:
- All Oozie CLI sub-commands expect the – Oozie URL option indicates the URL of the Oozie system to run the command.
- If the – Oozie option is not specified, the Oozie CLI will look for the Oozie-URL environment variable and uses if set.
- If the option is not provided and the environment variable is not set, the Oozie CLI will fail.
Time Zone:
- The Time Zone-ID option in the job and jobs sub-commands allows you to specify the time zone to use in the output of those sub-commands.
- The TIME-ZONE-ID should be one of the standard Java time zone IDs, and you can get a list of available time zones with the command oozie info-time zones.