HADOOP MAP REDUCE is a software framework to write application which can process huge amounts of data in-parallel on large clusters. Basically, it’s a programming model for the data processing. Generally, Hadoop can run Map-Reduce Programs Written In Various Languages like Java, Ruby, C++ and Python.
Parallel processing provided by Hadoop can be utilized to express the query in the form of map reduce hob.
Map reduce works by dividing the processing into two phases
1) Map phase and
2) Reduce phase.
Each phase has the key value pairs as Input and Output, the data types of which can be decided by the programmer. The Map- Reduce program when expressed in code, requires three different things,
1) Map function
2) Reduce function and
3) Code to run the job.
The map reduce framework consists of a Job Tracker and a Task Tracker per cluster node. The job tracker acts as a master and the task tracker acts as a Slave. The master is responsible for scheduling of the jobs component tasks on the slaves, and monitoring them and also re-executing the failed tasks and the slaves execute the tasks as instructed by the master.
Enthusiastic about exploring the skill set of Hadoop? Then, have a look at the HADOOP TRAINING together additional knowledge.
Creating a Single-Node Hadoop cluster on Windows using Cygwin
Steps to create Single node Hadoop Cluster are given as below :
Install Eclipse Europa 3.3.2
Download URL: https://eclipse.org/downloads/package/release/europa/winter
Install Cygwin in Desktop openssi and openssh packages.
Download URL: https://cygwin.com/ml/cygwin-announce/2008-08/msg00001.html
We need to make sure that package ‘openssh’ ’and’’ openssi’ are selected while installing. Both these packages are must for the proper functioning of hadoop cluster and also eclipse plug-in.
1. Set Environment Variables
Once Cygwin is installed, the next step would to be to set the environment variables.
Go to Start Menu My Computer Context Menu, Properties Click on Environment Variables in present in the tab ”Advanced”
Set JAVA HOME: Create new variable with name ”JAVA_ HOME” and assign the value with complete installed path.
Example: if JAVA is installed under C:javajdk1.6.24 location, set the same under value.
Set java/bin in the PATH variable
Edit variable called ”path” under” System Variables” ,Edit variable value Append the variable value with C:CYgwiin;C:Cygwinusrsbin
5. Configure SSH
Both hadoop scripts and also eclipse plug-in need password less ssh to operate. For configuring the SSH daemon, open Cygwin command prompt and execute ssh-host-config command and follow the given options.
>>privilege separation should be used>>no.
>>Install sshd as a service>>yes.
>>value of CYGWIN environment variable>>ntsec(default value).
Check whether a new service called “CYGWIN sshd” is created under services and applications. Start the service.
Go to Start Menu Run Msc CYGWIN sshd
Start the Service by Clicking the start button.
Generating Authorization Keys and Verifying:
Once the service is started, open the Cygwin command prompt and execute the set of commands mentioned below for setting up authorization keys
Execute next set of commands on the Cygwin command prompt.
bin/hadoop Name Node-format
Once the Name Node is formatted, the file system is created for us to proceed with next set of actions.
9. Next step is to Install Hadoop plug-in for Eclipse.
Execute the following set of commands in the Cygwin prompt.
cd ellipse- plug-in
Navigate to the Eclipse location and then,
Copy the file “hadoop-0.19.1-eclipse-plugin.jar, to the Eclipse plug-in directory, from the Hadoop eclipse plug-in directory.”
Start Eclipse and open the workspace, look for the open perspective icon which is located on top right side of eclipse app. Select ”other” from the open perspective menu.
From the list of perspectives, select Map/Reduce and press ”OK” button.
As a result, IDE should display the new perspective which looks similar to the image mentioned below.
10. Setup Hadoop Location in Eclipse
Next, Select the Map/Reduce Locations tab present at the bottom of your eclipse environment. Right click on the blank space present in that tab and then select new Hadoop location from the context menu available.
When dialog box pops up; fill in all the details required as mentioned in the screenshot below.
Once we finish all the above settings, it displays new location appearing in “Map/Reduce Locations” tab.
In the project Explorer tab which is present on the left hand side navigator of the eclipse window, we can see the DFS Locations item.
Once we expand it, we see the local host location which refers the blue elephant icon. Expand the items in the tree to the last level present.
Once we install and configure the hadoop cluster and once the eclipse plug-in is made available, we can start writing and testing your Hadoop application in eclipse.
In order to create and run the Hadoop project, File New Project Map/Reduce Project Next New Map/Reduce project.
11. Create and run Hadoop Eclipse project
In the next step, click on Configure Hadoop Installation link, displayed on the right side of the project configuration window. Project preferences window display is shown in the image below. Fill in the location of Hadoop directory in Hadoop Installation Directory in preferences and click OK, then close the project window after clicking on finish.
Explore Hadoop Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!