Hadoop Configuration with ECLIPSE ON Windows
HADOOP MAP REDUCE is a software framework to write application which can process huge amounts of data in-parallel on large clusters. Basically, it’s a programming model for the data processing. Generally, Hadoop can run Map-Reduce Programs Written In Various Languages like Java, Ruby, C++ and Python.
Parallel processing provided by Hadoop can be utilized to express the query in the form of map reduce hob.
Map reduce works by dividing the processing into two phases
1) Map phase and
2) Reduce phase.
Each phase has the key value pairs as Input and Output, the data types of which can be decided by the programmer. The Map- Reduce program when expressed in code, requires three different things,
1) Map function
2) Reduce function and
3) Code to run the job.
The map reduce framework consists of a Job Tracker and a Task Tracker per cluster node. The job tracker acts as a master and the task tracker acts as a Slave. The master is responsible for scheduling of the jobs component tasks on the slaves, and monitoring them and also re-executing the failed tasks and the slaves execute the tasks as instructed by the master.
Creating a Single-Node Hadoop cluster on Windows using Cygwin
Steps to create Single node Hadoop Cluster are given as below :
- Install Java1.6
- Install Eclipse Europa 3.3.2
- Install Cygwin with openssi and openssh packages.
- Set Environment Variables.
- Configure SSH
- Download and Install Hadoop
- Configure Hadoop
- Format the Name Node.
- Install Hadoop Eclipse plug-in
- Setup Hadoop Location in Eclipse
- Create and run Hadoop Eclipse project
Download URL : http://java.com/en/download/index.jsp
Install Eclipse Europa 3.3.2
Download URL: http://eclipse.org/downloads/package/release/europa/winter
Install Cygwin in Desktop openssi and openssh packages.
Download URL: http://cygwin.com/ml/cygwin-announce/2008-08/msg00001.html
We need to make sure that package ‘openssh’ ’and’’ openssi’ are selected while installing. Both these packages are must for the proper functioning of hadoop cluster and also eclipse plug-in.
1. Set Environment Variables
- Once Cygwin is installed, the next step would to be to set the environment variables.
- Go to Start Menu My Computer Context Menu, Properties Click on Environment Variables in present in the tab ”Advanced”
- Set JAVA HOME: Create new variable with name ”JAVA_ HOME” and assign the value with complete installed path.
- Example: if JAVA is installed under C:javajdk1.6.24 location, set the same under value.
Set java/bin in the PATH variable
Edit variable called ”path” under” System Variables” ,Edit variable value Append the variable value with C:CYgwiin;C:Cygwinusrsbin
5. Configure SSH
Both hadoop scripts and also eclipse plug-in need password less ssh to operate. For configuring the SSH daemon, open Cygwin command prompt and execute ssh-host-config command and follow the given options.
>>privilege separation should be used>>no.
>>Install sshd as a service>>yes.
>>value of CYGWIN environment variable>>ntsec(default value).
- Check whether a new service called “CYGWIN sshd” is created under services and applications. Start the service.
- Go to Start Menu Run Msc CYGWIN sshd
- Start the Service by Clicking the start button.
Generating Authorization Keys and Verifying:
Once the service is started, open the Cygwin command prompt and execute the set of commands mentioned below for setting up authorization keys
- ssh –keygen: For generating the keys, when prompted for filenames and pass phrases press ENTER to accept default values. Once the key is generated, enter the cd~/.ssh
- is-i: They contain tow files id_rsa.pub and id_rsa with the recently created dates. These files contain authorization key cat id_rsa.pub>> authorized _keys
- ssh local host: To verify if the keys are set – up correctly.
Download and Install Hadoop
- Download Hadoop 0.19.1.
- Open Cygwin command prompt window and direct it to home directory by using cd command.
- Execute explorer in Cygwin command prompt which opens the current location Explorer window.
- Copy downloaded Hadoop archive to explorer window.
- Execute tar –xzf Hadoop-0.19.1tar.gz in the cygwin prompt, this will start the process of unpacking Hadoop distribution. Once this is done, it will display newly created directory called hadoop-0.19.1
- Verify whether unpacking is success by executing cd Hadoop-0.19.1 and then -1, which provides the output as mentioned below which tells that everything is unpacked correctly.
7. Next step is to Configure Hadoop. Execute the following commands in Cygwin command prompt.
- cd Hadoop-0.19.1
- cd conf
- explorer (screenshot mentioned below)
- Edit Handoop-site.xml using edit plus and Insert the following lines between and tags.
8. Format the Name Node.
- Next step is to format the Name Node, which creates the HADOOP DISTRIBUTED FILE SYSTEM (HDFS).
- Execute next set of commands on the Cygwin command prompt.
bin/hadoop Name Node-format
- Once the Name Node is formatted, the file system is created for us to proceed with next set of actions.
9. Next step is to Install Hadoop plug-in for Eclipse.
- Execute the following set of commands in the Cygwin prompt.
cd ellipse- plug-in
- Navigate to the Eclipse location and then,
- Copy the file “hadoop-0.19.1-eclipse-plugin.jar, to the Eclipse plug-in directory, from the Hadoop eclipse plug-in directory.”
- Start Eclipse and open the workspace, look for the open perspective icon which is located on top right side of eclipse app. Select ”other” from the open perspective menu.
- From the list of perspectives, select Map/Reduce and press ”OK” button.
- As a result, IDE should display the new perspective which looks similar to the image mentioned below.
10. Setup Hadoop Location in Eclipse
- Next, Select the Map/Reduce Locations tab present at the bottom of your eclipse environment. Right click on the blank space present in that tab and then select new Hadoop location from the context menu available.
- When dialog box pops up; fill in all the details required as mentioned in the screenshot below.
- Once we finish all the above settings, it displays new location appearing in “Map/Reduce Locations” tab.
- In the project Explorer tab which is present on the left hand side navigator of the eclipse window, we can see the DFS Locations item.
- Once we expand it, we see the local host location which refers the blue elephant icon. Expand the items in the tree to the last level present.
- Once we install and configure the hadoop cluster and once the eclipse plug-in is made available, we can start writing and testing your Hadoop application in eclipse.
- In order to create and run the Hadoop project, File New Project Map/Reduce Project Next New Map/Reduce project.
11. Create and run Hadoop Eclipse project
In the next step, click on Configure Hadoop Installation link, displayed on the right side of the project configuration window. Project preferences window display is shown in the image below. Fill in the location of Hadoop directory in Hadoop Installation Directory in preferences and click OK, then close the project window after clicking on finish.