Hadoop Job Operations

 

Hadoop Job Operations

Hadoop Job Operations

Submitting a workflow, coordinator, or Bundle Job:

Submitting the Bundle feature is only supported in zones 3.0 or later. Similarly, all Bundle OPERATION features below are supported in zones 3.0 or later

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- config job. Properties –submit
  • Job: 14-20010525161321- oozie –job
  • The Properties for the job must be provided in a file, either a Java Properties file(.properties) or a Hadoop XML configuration file(.xml) and the file must be specified with the config option.
  • The workflow application path must be specified in the file with the oozie.wf. application. path Properties.
  • The coordinator application path must be specified in the file with the oozie. coord. application. path Properties.
  • The bundle application path must be specified in the file with the oozie. bundle. application. path Properties. and specified path must be an HDFS path.
  • The job will be created, but it will not be started and will be in preparation status.

Looking forward to becoming a Hadoop Developer? Check out the Big Data Hadoop Online Training and get certified today.

Starting a workflow, coordinator job Bundle Job:

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- start 14-20090525161321- oozie– joe
  • The start option starts a previously submitted workflow job, coordinator job, or bundle job that is in preparation status.
  • After the command is executed, the workflow job will be in RUNNING Status, coordinator job, and bundle job will also be in RUNNING Status

Running a workflow, coordinator, or Bundle job:

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- con fig job. properties –run
  • Job: 15-20090525161321- oozie– Joe
  • The run option creates and states a workflow job, coordinator job or bundle job
  • The Parameters for the job and workflow, coordinator and bundle application path are specified same as in submitting method.

Wish to learn more about Hadoop? Check out our comprehensive Hadoop Tutorial

Suspending a workflow, coordinator or Bundle job:

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Suspend 14-20090525161321- oozie– Joe
  • The Suspend option Suspends a workflow job in RUNNING Status
  • After the command is executed, the workflow job will be in SUSPEND Status.
  • The Suspend option Suspends a coordinator/bundle job in RUNNING  WITH ERROR or PREP Status
  • When the coordinator job is suspended, running coordinator actions will stay in running and the workflow will be in Suspended.

If the coordinator job is in RUNNING status, it will transit to SUSPEND Status. If it is in RUNNING WITH ERROR Status, it will transit to SUSPEND WITH ERROR and if it is in PREP status, it will Transit to PRE SUSPEND Status When the bundle job is suspended, the running coordinator will also be suspended.

If the bundle job is in RUNNING status, it will transit to SUSPENDED status. If it is in RUNNING WITH ERROR Status, it will transit to SUSPEND WITH ERROR and if it is in PREP status, it will Transit to PRE SUSPEND Status.

 MindMajix YouTube Channel

Resuming a workflow, coordinator, or Bundle job:

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Resume 14-20090525161321- oozie– Joe
  • The Resume option resumes a workflow job in SUSPENDED Status
  •  After the command is executed, the workflow job will be in RUNNING Status.

Killing a workflow, coordinator or Bundle job:

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Kill 14-20090525161321- oozie– Joe
  • The Kill option Kills a workflow job in PREP, SUSPENDED  or Status and coordinator or Bundle job in =PREP RUNNING, PREP SUSPENDED, SUSPENDED, or PAUSED Status
  •  After the command is executed, the job will be in KILLED Status.

Changing end time/concurrency/pause time of a coordinator job:

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Change 14-20090525161321- oozie– Joe –value end time=2011-12-01TOS:
  • The Change option Changes a coordinator job that is not in KILLED Status
  • Valid value names are end time, concurrency and pause time.
  • Repeated value names are not allowed.
  • New end time must not be before job’s start time and last action time.
  • New concurrency value has to be a valid integer.
  • After the command is executed. The job’s end time, concurrency or pause time should be changed.
  • If an already-SUCCEEDED job changes, its end time and its Status will keep running.

Changing pause time of a Bundle job:

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Change 14-20090525161321- oozie– Joe –value pause time=2011-12-01TOS:00Z

    • The Change option Changes a Bundle job as it is not in KILLED Status

    • Valid value names are

    • pause time : the pause time of the Bundle job

    • Repeated value names are not allowed.

    •  After the command is executed, the job’s pause time must be changed.

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- Con fig job. properties. -rerun 14-20090525161321- oozie– joe

    • The rerun option reruns a completed (SUCCEDED, FAILED, or KILLED) job by skipping the specified nodes.

    • After the command is executed, the job will be in RUNNING Status

Rerunning a coordinator Action or Multiple Actions:

Example:

  • $ oozie job – rerun[-no cleanup][-refresh][-action1,3-4,7-40] (-action or-date is required to rerun) [-date 2009-01-01T01:ooz:: 2009-05-31 T23: 59z, 2009-11-10T01: ooz, 2009-12-31T22:ooz](if neither- action nor-data is given, the exception will be thrown)
  • The rerun option reruns a terminated (=TIMEOUT=,SUCCEDED, KILLED,FAILED) coordinator action when coordiantor job is not in FAILED or KILLED state.
  • After the command is executed, the rerurn coordinator action will be in WAITING Status.

Rerunning a Bundle job:

Example:

  • $ oozie job – rerun< bundle –job-id >[-no cleanup][-refresh] [-coordinator c1,c3,c4)( coordinator or –date is required to rerun) [-date 2009-01-01T01:ooz:: 2009-05-31 T23: 59z, 2009-11-10T01: ooz, 2009-12-31T22:ooz] (if neither- coordinator nor –the date is given, the exception will be thrown)

    • The rerun option reruns coordinator action belonging to specified coordinator within the specified data range.

    • After the command is executed, the rerun coordinator action will be in WAITING Status.

Checking the status of a workflow, coordinator or Bundle Job or a coordinator Action:

Example:-

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- info 14-2009052511613 21-– oozie -Joe

  • Workflow Name     :                  map-reduce-wf

  • App path :   https://locaL host:8020/user/joe/work flows/mapreduce

  • Status                       :                  SUCCEDED

  • Run                           :                  0

  • User                          :                  Joe

  • Group                       :                  Users

  • Created                    :                  2009-05-26      05:01 +0000

  • Stated                       :                  2009-05-26      05:01 +0000

  • Ended                       :                  2009-05-26      05:01 +0000

  • Actions

    Action nameTypeStatusTransactionExternal IdExternal status
    Hadoop 1Map-reduceokendjob-20090428135-0524SUCCEDED
    Error codeStatus 2009-05-26END 2009-05-26  
    _05:01 +0000 05:01 +0000   
  • The info option can display information about a Workflow job or coordinator job or coordinator action.
  • The info command may time out if the number of coordinator actions are very high
  • In that case, info should be used with offset and lent option,
  • Offset and lent option specifies the display of offset and number of actions to display if checking a Workflow job or coordinator job

Checking the server logs  of a workflow, coordinator or Bundle Job

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- log 14-200905251613 21-– oozie –Joe

Checking  the server logs  of a particular action of a Coordinator Job :

Example:

  • $ oozie job – log[-action 1,3-4,7-40](-action is optional)

Checking the status of multiple workflow Job:

Example:

  • $ oozie job – oozie https://local host:11000/ oozie- localtime -len 2 – filter status- RUNNING
    •  A filter can be specified after all options.
    • The filter option syntax is : [NAME=VALUE][;NAME=VALUE]*
  • name: workflow application name
  • user: the user that submitted the job
  • group: the group for the job
  • status: the status of the job
  • frequency: frequency of the coordinator job
  • unit: the time unit which takes months, days, hours or minutes values.

Checking the status of multiple coordinator Job:

Example:

  • $ Oozie job – oozie HTTP://LOCALhost:11000/ oozie- job type coordinator
  • Job ID App Name status Freq Unit Stated Next Materialized
  • Successes 1440 minute

Checking the status of multiple Bundle Job:

Example:

  • $ oozie job – oozie HTTP://LOCAL host:11000/ oozie- job type Bundle
  • Job ID Bundle Name status kick off creator user group
  • 0000027-110 oozie-chao-B BUNDLE-TEST RUNNING 2012-01-15 00:24 2011-03 Joe users

Admin Operations:

Checking  the status of the oozie system

Example:

  • $ oozie admin – oozie HTTP://LOCAL host:11000/ oozie- status safe mode: OFF
  • It returns the current status of the oozie system

Checking the status of the oozie system(in oozie 20 or later)

Example:

  • $ oozie admin – oozie HTTP://LOCAL host:11000/ oozie- system mode
  • Safe mode: ON
  • It returns the current status of the oozie system

Frequently Asked MapReduce Interview Questions & Answers

Displaying the Build version of the oozie system

Example:

  • $ oozie admin – oozie HTTP://LOCAL host:11000/ oozie- version Oozie server Build version: 2.0.2.1-0.20.1.3092118008--
  • It returns the oozie server Build version.

Validate Operations

Example:

  • $ oozie validate my APP/Work flow.xml
  • It performs an XML schema validation on the specified Work flow xml file.

Pig Operations

Submitting a pig job through HTTP:

Example:

  • $ oozie pig – oozie HTTP://LOCAL host:11000/ oozie- file .pig script file -con fig job. Properties –X –param –file params Job: 14-2009052515161321-oozie-joe-w $ cat job. Properties Fs.default.name= hdfs:/1local host:8020 Map reduces. Job tracker. Kerberos. Principal=ccc dfs. Name Node. Kerberos. principal= ddd Oozie. Libpath =hdfs:/1localhost:8020/user/ Oozie/pigl lib/
  • The parameters for the job must be provided in a Java properties file(.properties).
  • Job tracker, Name Node, lib path must be specified in this file.
  • Pig script file is a local file
  •  All jar files including pig jar and all other files needed by the pig job, need to be uploaded on to HDFS under lib path beforehand.
  • The workflow.xml will be created in Oozie server initially.
  • The job will be created and run right away.

Map-reduce Operations:

Submitting a map-reduce job

Example:

  • $ oozie map-reduce- oozie HTTP://LOCAL host:11000/ oozie- con fig .job. properties.
  • The parameters must be in the Java properties file. And this file must be specified for a map-reduce job.
  • The properties file must specify the mapped. Mapper-class, mapred.

Re-Run:

  • Reloads the config.
  • Creates a new work flow instance with the same Id
  • Deletes the actions that are not skipped from the DB and copies data from old work flow insurance to new one for skipped actions.
  • Action handler will skip the nodes given in the con fig with the same exit transaction as before.

Work flow Re Run:

  •  Config
  •  Pre- conditions
  •  Reruns.

Config:

  • .Oozie. wf. application. Path
  • Only one of following two configurations is mandatory and both should not be defined at the same time.
  • Oozie. wf. Return. Fail nodes. Oozie.wf. rerun .fail nodes
  • Skip nodes are comma separated list of action names. And they can be any action nodes including decision node.
  • The valid value of oozie. Wf. Re run. Fail nodes is either true or false
  • If secured hadoop version is used, the following two properties needs to be specified as well
  • -dfs. Name Node. Kerberos. Principal -map reduce. Job tracker. Kerberos. principal

Pre-conditions:

  • Workflow with id WFID should exist.
  • Workflow with id WFID should be Succeeded/Killed/failed.
  • If specified, nodes in the config oozie.wf. rerun. Skip. Nodes must be completed successfully.
  • Explore MapReduce Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Big Data Courses:

 Hadoop Administration MapReduce
 Big Data On AWS Informatica Big Data Integration
 Bigdata Greenplum DBA Informatica Big Data Edition
 Hadoop Hive Impala
 Hadoop Testing Apache Mahout

 

Course Schedule
NameDates
Hadoop TrainingSep 21 to Oct 06View Details
Hadoop TrainingSep 24 to Oct 09View Details
Hadoop TrainingSep 28 to Oct 13View Details
Hadoop TrainingOct 01 to Oct 16View Details
Last updated: 09 Apr 2022
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less