Mindmajix

Hadoop Configuration with ECLIPSE ON Windows

Introduction

Hadoop map reduce is a software framework to write application which can process huge amounts of data in-parallel on large clusters. Basically, it’s a programming model for the data processing. Generally, Hadoop can run Map-Reduce Programs Written In Various Languages like Java, Ruby, C++ and Python.

Parallel processing provided by Hadoop can be utilized to express the query in the form of map reduce hob.

Map reduce works by dividing the processing into two phases

1) Map phase and

2) Reduce phase.

Each phase has the key value pairs as Input and Output, the data types of which can be decided by the programmer. The Map- Reduce program when expressed in code, requires three different things,

1) Map function

2) Reduce function and

3) Code to run the job.

The map reduce framework consists of a Job Tracker and a Task Tracker per cluster node. The job tracker acts as a master and the task tracker acts as a Slave. The master is responsible for scheduling of the jobs component tasks on the slaves, and monitoring them and also re-executing the failed tasks and the slaves execute the tasks as instructed by the master.

Creating a Single-Node Hadoop cluster on Windows using Cygwin

Steps to create Single node Hadoop Cluster are given as below :

  1. Install Java1.6
  2. Install Eclipse Europa 3.3.2
  3. Install Cygwin with openssi and openssh packages.
  4. Set Environment Variables.
  5. Configure SSH
  6. Download and Install Hadoop
  7. Configure Hadoop
  8. Format the Name Node.
  9. Install Hadoop Eclipse plug-in
  10. Setup Hadoop Location in Eclipse
  11. Create and run Hadoop Eclipse project

Detailed description:-

  1. Install Java1.6

Download URL : http://java.com/en/download/index.jsp

  1. Install Eclipse Europa 3.3.2

         Download URL:

http://eclipse.org/downloads/package/release/europa/winter

  1. Install Cygwin in Desktop openssi and openssh packages.

Download URL:

http://cygwin.com/ml/cygwin-announce/2008-08/msg00001.html

We need to make sure that package ‘openssh’ ’and’’ openssi’ are selected while installing. Both these packages are must for the proper functioning of hadoop cluster and also eclipse plug-in.

04 cygwin ssh

  1. Set Environment Variables
  • Once Cygwin is installed, the next step would to be to set the environment variables.
  • Go to Start Menu Capture 15 My Computer Capture 15 Context Menu, Properties Capture 15 Click on Environment Variables in present in the tab ”Advanced”
  • Set JAVA HOME: Create new variable with name ”JAVA_ HOME” and assign the value with complete installed path.
  • Example: if JAVA is installed under C:\java\jdk1.6.24 location, set the same under value.

Screenshot_1833

Capture 15 Set java/bin in the PATH variable

java1

Capture 15 Edit variable called ”path” under” System Variables” ,Edit  variable value Capture 15 Append the variable value with C:\CYgwiin;C:\Cygwin\usr\sbin

Screenshot_1834

5. Configure SSH

Both hadoop scripts and also eclipse plug-in need password less ssh to operate. For configuring the SSH daemon, open Cygwin command prompt and execute ssh-host-config command and follow the given options.

>>privilege separation should be used>>no.

>>Install sshd as a service>>yes.

>>value of CYGWIN environment variable>>ntsec(default value).

05 cygwin ssh

  • Check whether a new service called “CYGWIN sshd” is created under services and applications. Start the service.
  • Go to Start Menu Capture 15 Run Capture 15  Msc Capture 15 CYGWIN sshd
  • Start the Service by Clicking the start button.

06 cygwin sshd

Generating Authorization Keys and Verifying:

Once the service is started, open the Cygwin command prompt and execute the set of commands mentioned below for setting up authorization keys

  • ssh –keygen: For generating the keys, when prompted for filenames and pass phrases press ENTER to accept default values. Once the key is generated, enter the cd~/.ssh
  • is-i: They contain tow files id_rsa.pub and id_rsa with the recently created dates. These files contain authorization key cat id_rsa.pub>> authorized _keys
  • ssh local host :to verify if the keys are set – up correctly.

2013-08-07 21_34_34-Raspberry Pi Hadoop Compute Cluster [file__..._Raspberry Pi Hadoop Compute Clust

  1. Download and Install Hadoop
  • Download Hadoop 0.19.1.
  • Open Cygwin command prompt window and direct it to home directory by using cd command.
  • Execute explorer in Cygwin command prompt which opens the current location Explorer window.
  • Copy downloaded Hadoop archive to explorer window.

Screenshot_1835

  • Execute tar –xzf Hadoop-0.19.1tar.gz in the cygwin prompt, this will start the process of unpacking Hadoop distribution. Once this is done, it will display newly created directory called hadoop-0.19.1
  • Verify whether unpacking is success by executing cd Hadoop-0.19.1 and then -1, which provides the output as mentioned below which tells that everything is unpacked correctly.

Screenshot_1836

  1. Next step is to Configure Hadoop. Execute the following commands in Cygwin command prompt.
  • cd Hadoop-0.19.1
  • cd conf
  • explorer (screenshot mentioned below)

Hadoop-1.0-Installation-25

  • Edit Handoop-site.xml using edit plus and Insert the following lines between<configuration> and </configuration > tags.

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9100</value>

</property>

<property>

<name> map red. job. tracker</name>

<value> localhost:9100</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

  1. Format the Name Node.
  • Next step is to format the Name Node, which creates the Hadoop distributed file system (HDFS).
  • Execute next set of commands on the Cygwin command prompt.

Capture 15 cd hadoop-0.19.1

Capture 15 mkdir logs

Capture 15 bin/hadoop Name Node-format

  • Once the Name Node is formatted, the file system is created for us to proceed with next set of actions.

Screenshot_1837

  1. Next step is to Install Hadoop plug-in for Eclipse .
  • Execute the following set of commands in the Cygwin prompt.

Capture 15 cd hadoop-0.19.1

Capture 15 cd contrib.

Capture 15 cd ellipse- plug-in

Capture 15 explorer.

png;base645a2251deb3a23e6c

  • Navigate to the Eclipse location and then,
  • Copy the file “hadoop-0.19.1-eclipse-plugin.jar, to the Eclipse plug-in directory, from the Hadoop eclipse plug-in directory.”

image001

  • Start Eclipse and open the workspace, look for the open perspective icon which is located on top right side of eclipse app. Select ”other” from the open perspective menu.
  • From the list of perspectives, select Map/Reduce and press ”OK” button.
  • As a result, IDE should display the new perspective which looks similar to the image mentioned below.

13.5.3 copy hadoop plugin

      10. Setup Hadoop Location in Eclipse

  • Next, Select the Map/Reduce Locations tab present at the bottom of your eclipse environment. Right click on the blank space present in that tab and then select new Hadoop location from the context menu available.
  • When dialog box pops up; fill in all the details required as mentioned in the screenshot below.

Screenshot_1839

  • Once we finish all the above settings, it displays new location appearing in “Map/Reduce Locations” tab.
  • In the project Explorer tab which is present on the left hand side navigator of the eclipse window, we can see the DFS Locations item.
  • Once we expand it, we see the local host location which refers the blue elephant icon. Expand the items in the tree to the last level present.

Screenshot_1840

  • Once we install and configure the hadoop cluster and once the eclipse plug-in is made available, we can start writing and testing your Hadoop application in eclipse.
  • In order to create and run the Hadoop project, File Capture 15 New Capture 15 Project Capture 15 Map/Reduce Project Capture 15 Next Capture 15 New Map/Reduce project.

1

        11. Create and run Hadoop Eclipse project

In the next step, click on Configure Hadoop Installation link, displayed on the right side of the project configuration window. Project preferences window display is shown in the image below. Fill in the location of Hadoop directory in Hadoop Installation Directory in preferences and click OK, then close the project window after clicking on finish.

656602-20160515133401008-1940681068

 

 

 

 

 


 

0 Responses on Hadoop Configuration with ECLIPSE ON Windows"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.