Using Java In Talend

Java is a hugely popular and incredibly rich programming language. Talend is a Java Code Generator that makes use of many open-source Java libraries, so this means that Talend functionality can easily be extended by integrating Java code into Talend jobs.

The Java representation allows you to transform Java object instances. You can import Java classes individually, from a folder, or in a JAR file, and the Java importer will create structure definitions from each class. At runtime, you can provide Java object(s) as the source(s) of a transformation or accept them as the result(s).

This section contains recipes that show some of the techniques for making use of Java within Talend jobs.

If you would like to Enrich your career with a Talend certified professional, then visit Mindmajix - A Global online training platform: “Talend Online Course”. This course will help you to achieve excellence in this domain.

Introduction to How to Use Java In Talend

For many data integration requirements, the standard Talend components provide the means to process the data from start to end without needing to use Java code apart from in tMap.

For more complex requirements, it is often necessary to add additional Java logic to a job, and in other cases, it may be that adding custom Java code will provide a simpler or more elegant or more efficient code than using the standard components.

Related Article: Java Tutorial for Beginners

Performing one-off pieces of logic using tJava

Function tJava enables you to enter a personalized code in order to integrate it in Talend program. You can execute this code only once.

Purpose tJava makes it possible to extend the functionalities of a Talend Job through using Java commands.

The tJava component allows one-off logic to be added to a job. Common uses of tJava include setting global or context variables prior to the main data processing stages and printing logging messages.

Getting ready

Open the job jo_cook_ch05_0000_tJava.

How to achieve it…

  • Open the tJava
  • Type in the following code:
System.out.println("Executing job "+jobName+" at "+TalendDate.getDate("CCYY-MM-dd HH:mm:ss"));
  • Run the job. You will see that the message is printed showing the job name and the date and time of execution.

How it works…

If you examine the code, you will see that the Java code is simply added to the generated code as it is. This is why you must remember to add; to the end of the line to avoid compilation errors.

Setting the context and globalMap variables using tJava

Although this recipe is centered on the use of tJava, it also acts as a convenient means of illustrating how the context and globalMap variables can be directly referenced from within the majority of Talend components.

Getting ready

Open jo_cook_ch05_0010_tJavaContextGlobalMap, then open the context panel, and you should see a variable named testValue.

context panel

How to achieve it…

  • Open tMap_1 and type in the following code:
System.out.println("tJava_1");
context.testValue ="testValue is now initialized"; 
globalMap.put("gmTestValue", "gmTestValue is now initialized");
  • Open tMap_2 and type in the following code:
System.out.println("tJava_2"); 
System.out.println("context.testValue is: "+context.testValue); 
System.out.println("gmTestValue is: "+(String) 
globalMap.get("gmTestValue"));
  • Run the job. You will see that the variables initialized in the first tJava are printed correctly in the second.

MindMajix YouTube Channel

How it works…

The context and globalMap variables are stored as globally available Java hashMaps, meaning that they are keyed values. This enables these values to be referenced within any of the other components, such as tMap, tFixedFlowInput, and tFileInputDelimited.

There’s more…

This recipe shows variables being set in a one-off fashion using tJava. It is worth noting that the same principles apply to tJavaRow. Because tJavaRow is called for every row processed, it is possible to create a global variable for a row that can be referenced by all components in a flow. This can be useful when pre and post field values are required for comparison purposes later in the flow. Storing in the globalMap variables avoids the need to create additional schema columns.

Related Article: Talend Questions

Adding complex logic into a flow using tJavaRow

Function tJavaRow allows you to enter a customized code that you can integrate into a Talend program. With tJavaRow, you can enter the Java code to be applied to each row of the flow.

Purpose tJavaRow allows you to broaden the functionality of Talend Jobs, using the Java language.

The tJavaRow component allows Java logic to be performed for every record within a flow.

Related Article: Java debugger and tJavaRow in Talend

Getting ready

Open the job jo_cook_ch05_0020_tJavaRow.

How to achieve it…

  • Add the tJavaRow and tLogRow
  • Link the flows as shown in the following screenshot:

tJavaRow

  • Open the schema and you will see that there are no fields in the output. Highlight name, dateOfBirth, and age, and click on the single arrow.
  • Use the + button to add new columns cleansedName (String) and rowCount (Integer), so that the schema looks like the following:

tJavaRow_1

  • Close the schema by pressing OK and then press the Generate code button in the main tJavaRow screen. The generated code will be as follows:
//Code generated according to input schema and output schema
output_row.name = input_row.name;
output_row.dateOfBirth = input_row.dateOfBirth; 
output_row.age = input_row.timestamp; 
output_row.cleanedName = input_row.age;
output_row.rowCount = input_row.age;
  • Change the row age = input_row.timestamp from the code to read output_row.age = input_row.age.
  • Remove the rows for cleanedName and output_row.rowCount, and replace them with the following code:
if (input_row.name.startsWith("J ")) {
output_row.cleanedName = 
StringHandling.EREPLACE(input_row.name, "J ", "James ");
}
if (input_row.name.startsWith("Jo ")) {
output_row.cleanedName = 
StringHandling.EREPLACE(input_row.name, "Jo ", "Joanne ");
}
output_row.rowCount=Numeric.sequence("s1",1,1); 
output_row.rowCount=Numeric.sequence("s1",1,1);
  • Run the job. You will see that “J ” and “Jo ” have been replaced, and each row now has a rowCount value

How it works…

The tJavaRow component is much like a 1 input to 1 output tMap, in that, input columns can be ignored and new columns can be added to the output.

Once the output fields have been defined, the Generate code button will create a Java mapping for every output field. If the names are the same, then it will map correctly. If input fields are not found or are named differently, then it will automatically map the field in the same position in the input or the last known input field, so be careful when using this option if you have removed fields. In some cases, it is best to propagate all fields, generate the mappings and then remove unwanted fields and mappings.

Tip

Also, be aware that the Generate Code option will remove all code in the window. If you have code that you wish to keep, then ensure that you copy it into a text editor before regenerating the code.

As you can also see from the code that was added, it is possible to use Talend’s own functions (StringHandling.EREPLACE, Numeric.sequence) in the Java components along with any other normal Java syntax, like the if statement and startsWith String method.

Importing JAR files to allow the use of external Java classes

Talend has a rich set of functions and libraries available within its suite. However, you may want to use libraries provided by data vendor or API vendor e.g. if you want to fetch data from Google Adwords then you may want to include or import the library/jar files provided by Google into Talend.

Occasionally, during development, it is necessary (or simpler) to make use of Java classes that aren’t already included within Talend. These may be pre-existing Java code such as financial calculations or open-source libraries, which are provided by The Apache Software Foundation.

In this example, we will make use of a simple Java class ExternalValidations and its ExternalValidateCustomerName method. This class performs the following simple validation:

if (customerName.startsWith("J ")) {
return customerName.replace("J ", "James ");
} else {
if (customerName.startsWith("Jo ")) {
return customerName.replace("Jo ", "Joanne "); } else {
return customerName;
}
}

Getting ready

Open job jo_cook_ch05_0050_externalClasses.

How to do it…

  • Create a code routine called external validation.
  • Right-click and select the option Edit routine Libraries.

Routine Libraries

  • In the next dialogue, click on New.
  • Select the option Browse a library file, and browse to the cookbookData folder which contains a sub-folder named externalJar. Click on jar, then click OK to confirm. The import dialogue should now look at the following:

Import external libarary

  • Return to the job and open tJavaRow, and click on the Advanced settings tab.
  • Add the following code:
import talendExternalClass.ExternalValidations;
  • Return to the Basic tab and add the following code:
output_row.validatedName =ExternalValidations.ExternalValidateCustomerName(input_row.name);
  • Run the job. You will see that the validations have taken place, and the customer names have been changed.
Explore TALEND Sample Resumes! Download & Edit, Get Noticed by Top Employers!

Note

If you get an error when running this job, then it is possible because the new class has not been set up as a dependency automatically.

How it works…

The code routine external validations is a dummy routine used to attach the external jar file and make it available for all jobs in the project.

In order to use the classes in the JAR file, it is necessary to add an import statement within the tJavaRow so that the code knows where to find the methods.

There’s more…

An alternate method of achieving this for just a single job is to use the tLibraryLoad components at the start of the job to define the location of the external libraries and the JAR files required.

Course Schedule
NameDates
Core Java TrainingNov 09 to Nov 24View Details
Core Java TrainingNov 12 to Nov 27View Details
Core Java TrainingNov 16 to Dec 01View Details
Core Java TrainingNov 19 to Dec 04View Details
Last updated: 03 Apr 2023
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less
  1. Share: