Mindmajix

Using Java in Talend

Java is a hugely popular and incredibly rich programming language. Talend is a Java code generator which makes use of many open source Java libraries, so this means that Talend functionality can easily be extended by integrating Java code into Talend jobs.

The Java representation allows you to transform Java object instances. You can import Java classes individually, from a folder, or in a JAR file, and the Java importer will create structure definitions from each class. At runtime, you can provide Java object(s) as the source(s) of a transformation or accept them as the result(s).

This section contains recipes that show some of the techniques for making use of Java within Talend jobs.

Introduction

For many data integration requirements, the standard Talend components provide the means to process the data from start to end without needing to use Java code apart from in tMap.

For more complex requirements, it is often necessary to add additional Java logic to a job, and in other cases it may be that adding custom Java code will provide a simpler or more elegant or more efficient code than using the standard components.

Performing one-off pieces of logic using tJava

Function

tJava enables you to enter personalized code in order to integrate it in Talend program. You can execute this code only once.

Purpose

tJava makes it possible to extend the functionalities of a Talend Job through using Java commands.

The tJava component allows one-off logic to be added to a job. Common uses of tJava include setting global or context variables prior to the main data processing stages and printing logging messages.

Getting ready

Open the job jo_cook_ch05_0000_tJava.

How to achieve it…

  • Open the tJava
  • Type in the following code:
System.out.println("Executing job "+jobName+" at "+TalendDate.getDate("CCYY-MM-dd HH:mm:ss"));
  • Run the job. You will see that message is printed showing the job name and the date and time of execution.

How it works…

If you examine the code, you will see that the Java code is simply added to the generated code as it is. This is why you must remember to add ; to the end of the line to avoid compilation errors.

Setting the context and globalMap variables using tJava

Although this recipe is centered on the use of tJava, it also acts as a convenient means of illustrating how the context and globalMap variables can be directly referenced from within the majority of Talend components.

Getting ready

Open jo_cook_ch05_0010_tJavaContextGlobalMap, then open the context panel, and you should see a variable named testValue.

Screenshot_1931

How to achieve it…

  • Open tMap_1 and type in the following code:
System.out.println("tJava_1");
context.testValue ="testValue is now initialized"; 
globalMap.put("gmTestValue", "gmTestValue is now initialized");
  • Open tMap_2 and type in the following code:
System.out.println("tJava_2"); 
System.out.println("context.testValue is: "+context.testValue); 
System.out.println("gmTestValue is: "+(String) 
globalMap.get("gmTestValue"));
  • Run the job. You will see that the variables initialized in the first tJava are printed correctly in the second.

How it works…

The context and globalMap variables are stored as globally available Java hashMaps, meaning that they are keyed values. This enables these values to be referenced within any of the other components, such as tMap, tFixedFlowInput, and tFileInputDelimited.

There’s more…

This recipe shows variables being set in a one-off fashion using tJava. It is worth noting that the same principles apply to tJavaRow. Because tJavaRow is called for every row processed, it is possible to create a global variable for a row that can be referenced by all components in a flow. This can be useful when pre and post field values are required for comparison purposes later in the flow. Storing in the globalMap variables avoids the need to create additional schema columns.

Adding complex logic into a flow using tJavaRow

Function

tJavaRow allows you to enter customized code which you can integrate in a Talend programme. With tJavaRow, you can enter the Java code to be applied to each row of the flow.

Purpose

tJavaRow allows you to broaden the functionality of Talend Jobs, using the Java language.

The tJavaRow component allows Java logic to be performed for every record within a flow.

Getting ready

Open the job jo_cook_ch05_0020_tJavaRow.

How to achieve it…

  • Add the tJavaRow and tLogRow
  • Link the flows as shown in the following screenshot:

Screenshot_1932

  • Open the schema and you will see that there are no fields in the output. Highlight name, dateOfBirth, and age, and click on the single arrow.
  • Use the + button to add new columns cleansedName (String) and rowCount (Integer), so that the schema looks like the following:

Screenshot_1933

  • Close the schema by pressing OK and then press the Generate code button in the main tJavaRow screen. The generated code will be as follows:
//Code generated according to input schema and output schema
output_row.name = input_row.name;
output_row.dateOfBirth = input_row.dateOfBirth; 
output_row.age = input_row.timestamp; 
output_row.cleanedName = input_row.age;
output_row.rowCount = input_row.age;
  • Change the row age = input_row.timestamp from the code to read output_row.age = input_row.age.
  • Remove the rows for cleanedName and output_row.rowCount, and replace with the following code:
if (input_row.name.startsWith("J ")) {
output_row.cleanedName = 
StringHandling.EREPLACE(input_row.name, "J ", "James ");
}
if (input_row.name.startsWith("Jo ")) {
output_row.cleanedName = 
StringHandling.EREPLACE(input_row.name, "Jo ", "Joanne ");
}
output_row.rowCount=Numeric.sequence("s1",1,1); 
output_row.rowCount=Numeric.sequence("s1",1,1);
  • Run the job. You will see that “J ” and “Jo ” have been replaced, and each row now has a rowCount value

How it works…

The tJavaRow component is much like a 1 input to 1 output tMap, in that, input columns can be ignored and new columns can be added to the output.

Once the output fields have been defined, the Generate code button will create a Java mapping for every output field. If the names are the same, then it will map correctly. If input fields are not found or are named differently, then it will automatically map the field in the same position in the input or the last known input field, so be careful when using this option if you have removed fields. In some cases, it is best to propagate all fields, generate the mappings and then remove unwanted fields and mappings.

Tip

Also, be aware that the Generate Code option will remove all code in the window. If you have code that you wish to keep, then ensure that you copy it into a text editor before regenerating the code.

As you can also see from the code that was added, it is possible to use Talend’s own functions (StringHandling.EREPLACE, Numeric.sequence) in the Java components along with any other normal Java syntax, like the if statement and startsWith String method.

Importing JAR files to allow use of external Java classes

Talend has rich set of functions and libraries available with in its suite. However, you may want to use libraries provided by data vendor or API vendor e.g. if you want to fetch data from Google Adwords then you may want to include or import the library/jar files provided by Google into Talend.

Occasionally, during development, it is necessary (or simpler) to make use of Java classes that aren’t already included within Talend. These may be pre-existing Java code such as financial calculations or open source libraries, which are provided by The Apache Software Foundation.

In this example, we will make use of a simple Java class ExternalValidations and its ExternalValidateCustomerName method. This class performs the following simple validation:

if (customerName.startsWith("J ")) {
return customerName.replace("J ", "James ");
} else {
if (customerName.startsWith("Jo ")) {
return customerName.replace("Jo ", "Joanne "); } else {
return customerName;
}
}

Getting ready

Open job jo_cook_ch05_0050_externalClasses.

How to do it…

  • Create a code routine called externalValidation.
  • Right-click and select the option Edit routine Libraries.

Screenshot_1936

  • In the next dialogue, click on New.
  • Select the option Browse a library file, and browse to the cookbookData folder which contains a sub-folder named externalJar. Click on jar, then click OK to confirm. The import dialogue should now look at the following:

using java

  • Return to the job and open tJavaRow, and click on the Advanced settings tab.
  • Add the following code:
import talendExternalClass.ExternalValidations;
  • Return to the Basic tab and add the following code:
output_row.validatedName =ExternalValidations.ExternalValidateCustomerName(input_row.name);
  • Run the job. You will see that the validations have taken place, and the customer names have been changed.

Note

If you get an error when running this job, then it is possibly because the new class has not been set up as a dependency automatically.

How it works…

The code routine externalValidations is a dummy routine used to attach the external jar file and make it available for all jobs in the project.

In order to use the classes in the JAR file, it is necessary to add an import statement within the tJavaRow so that the code knows where to find the methods.

There’s more…

An alternate method of achieving this for just a single job is to use the tLibraryLoad components at the start of the job to define the location of the external libraries and the JAR files required.

 


 

 

0 Responses on Using Java in Talend"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.