Mindmajix

Introduction to Apache Spark Streaming

Apache Spark Streaming

Spark streaming is a feature which provides us with a fault tolerant, and highly scalable streaming process.

Setting up the system

In this section, you will learn how to set up the system ready for streaming in both Scala and Java. For Scala users, this should be as follows:

* scala/sbt: This is the directory containing the SBT tools.
* scala/build.sbt: this is the project file for SBT.
* scala/Section.scala: this is the Main Scala program that that is to be used for editing, compiling and running.
* scala/Helper.scala: This is the scala file which has the helper functions for Section.scala

For Java users, this should be as follows:

* java/sbt: this is the directory having the SBT tool
* java/build.sbt: this is the project file for SBT
* java/Section.java this is the main program to edit, compile and run
* java/Heler.java: this is a java file which has few helper functions
* java/ScalaHelper.java: this is a Scala file which has few helper functions

We now need to have a look at the main file which is to be edited, compiled, and then run. This is the Section.java or the Section. Scala file. Its code is given below:

For Scala, it is as follows:

import spark._
import StreamingContext._
import spark.streaming._
import TutorialHelper._
object Section {
def main(args: Array[String]) {
// The Location of our Spark directory
val spHome = “/root/spark”
// The URL for the Spark cluster
val spUrl = getSparkUrl()
// where the JAR files are located
val jFile = “target/scala- 2.9.3/section_2.9.3-0.1-SNAPSHOT.jar”
//Our HDFS directory for checkpointing purpose
val chpointDir = Helper.getHdfsUrl() + “/checkpoint/”
// use twitter.txt for configuring the credentials of Twitter
}
}

For Java, this should be as follows:

import java.util.Arrays;
import spark.api.java.function.*;
import spark.api.java.*;
import scala.Tuple2;
import spark.streaming.*;
import spark.streaming.api.java.*;
import twitter4j.*;
public class Section {
public static void main(String[] args) throws Exception {
//The Location of our Spark directory
String spHome = “/root/spark”;
//The URL for the Spark cluster
String spUrl = Helper.getSparkUrl();
//The Location having the required JAR files
String jFile = “target/scala- 2.9.3/section_2.9.3-0.1-SNAPSHOT.jar”;
// The HDFS directory for purpose of checkpointing
String chpointDir = Helper.getHdfsUrl() + “/checkpoint/”;
// The login.txt text for twitter credentials
// The code should be added here
}
}

Note the use of the helper functions in the code which will help in adding the parameters which are needed for our exercises. The function “getSparkUrl()” will help in fetching the Spark cluster URL which is contained in the file “/root/spark-ec2/cluster-url.”The function “configureTwitterCredential()” is also a helper function, and it will help in the configuration of Twitter authentication details by use of the file “/root/streaming/twitter.txt.”

We should also configure the OAuth authentication for our Twitter account. For you to do this, a consumer key+secret pair has to be set up by use of the Twitter account.

You should begin by creating a new and a temporary application. To do this, just click on the blue button labeled “Create a new application.” The following window will appear:

apache spark streaming

You can then provide the details about the application which are needed in the above form. In the case of the name, it has to be unique. After creation of the project, you will be asked to confirm it. At the window for doing this, you will see the consumer key and the consumer secret which the system has generated for you. For you to generate the access token and the access token secret, just navigate to the bottom and then click on the blue button labeled “Create my access token.” You need to know that a confirmation will be available at the top which will tell you that it has been generated.

Screenshot_45

If you need to obtain the keys and the secrets which are required for the purpose of authentication, click on the tab labeled “OAuth Tool” at the menu located at the top of the page. A page with all of the details will be presented to you.

You can then open the text editor of your choice so as to edit the configuration details of the Twitter. This is shown below:

cd /root/streaming/
vim twitter.txt

After execution of the above commands, the following template will be observed:

consumerKey =
consumerSecret =
accessToken =
accessTokenSecret =

The above should then be filled with the appropriate keys. These can be copied from the previous web page, and then pasted in the appropriate fields. Once you are done, it is good for you to perform a double check so as to ensure that the right keys have been provided in the right fields. You can then save your configuration settings before proceeding to write the program for streaming purpose.

Now, we want to write our program for streaming purpose. This should print the tweets that it receives each second. First, begin by opening the file with a text editor as shown below:

In Scala:

cd /root/streaming/scala/
vim Section.scala

In Java, this will be as follows:

cd /root/streaming/java/
vim Section.java

The following should be the next part for Scala users:

val tts = ssc.twitterStream()

For Java users, the above should be as follows:

JavaDStream<Status> tts = ssc.twitterStream();

To print some of the tweets which are available, let’s go as follows:

Scala:

JavaDStream<Status> tts = ssc.twitterStream();

For Java, this will be as follows:

JavaDStream<String> st = tts.map(
new Function<Status, String>() {
public String call(Status st) { return st.getText(); }
}
);
st.print();

In the case of the intermediate data, we have to set some checkpointing operation. This can be done as follows:

Scala:

ssc.checkpoint(chpointDir)

Java:

ssc.checkpoint(chpointDir);

Our final step should be to inform the context to start the computation which we have just created. This can be done as follows:

Scala:

ssc.start()

Java:

ssc.start();

That is how the process can be done.

0 Responses on Introduction to Apache Spark Streaming"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.