Java is a hugely popular and incredibly rich programming language. Talend is a Java Code Generator that makes use of many open-source Java libraries, so this means that Talend functionality can easily be extended by integrating Java code into Talend jobs.
The Java representation allows you to transform Java object instances. You can import Java classes individually, from a folder, or in a JAR file, and the Java importer will create structure definitions from each class. At runtime, you can provide Java object(s) as the source(s) of a transformation or accept them as the result(s).
This section contains recipes that show some of the techniques for making use of Java within Talend jobs.
If you would like to Enrich your career with a Talend certified professional, then visit Mindmajix - A Global online training platform: “Talend Online Course”. This course will help you to achieve excellence in this domain. |
For many data integration requirements, the standard Talend components provide the means to process the data from start to end without needing to use Java code apart from in tMap.
For more complex requirements, it is often necessary to add additional Java logic to a job, and in other cases, it may be that adding custom Java code will provide a simpler or more elegant or more efficient code than using the standard components.
Related Article: Java Tutorial for Beginners |
Function tJava enables you to enter a personalized code in order to integrate it in Talend program. You can execute this code only once.
Purpose tJava makes it possible to extend the functionalities of a Talend Job through using Java commands.
The tJava component allows one-off logic to be added to a job. Common uses of tJava include setting global or context variables prior to the main data processing stages and printing logging messages.
Open the job jo_cook_ch05_0000_tJava.
System.out.println("Executing job "+jobName+" at "+TalendDate.getDate("CCYY-MM-dd HH:mm:ss"));
If you examine the code, you will see that the Java code is simply added to the generated code as it is. This is why you must remember to add; to the end of the line to avoid compilation errors.
Although this recipe is centered on the use of tJava, it also acts as a convenient means of illustrating how the context and globalMap variables can be directly referenced from within the majority of Talend components.
Getting ready
Open jo_cook_ch05_0010_tJavaContextGlobalMap, then open the context panel, and you should see a variable named testValue.
How to achieve it…
System.out.println("tJava_1");
context.testValue ="testValue is now initialized";
globalMap.put("gmTestValue", "gmTestValue is now initialized");
System.out.println("tJava_2");
System.out.println("context.testValue is: "+context.testValue);
System.out.println("gmTestValue is: "+(String)
globalMap.get("gmTestValue"));
How it works…
The context and globalMap variables are stored as globally available Java hashMaps, meaning that they are keyed values. This enables these values to be referenced within any of the other components, such as tMap, tFixedFlowInput, and tFileInputDelimited.
This recipe shows variables being set in a one-off fashion using tJava. It is worth noting that the same principles apply to tJavaRow. Because tJavaRow is called for every row processed, it is possible to create a global variable for a row that can be referenced by all components in a flow. This can be useful when pre and post field values are required for comparison purposes later in the flow. Storing in the globalMap variables avoids the need to create additional schema columns.
Related Article: Talend Questions |
Function tJavaRow allows you to enter a customized code that you can integrate into a Talend program. With tJavaRow, you can enter the Java code to be applied to each row of the flow.
Purpose tJavaRow allows you to broaden the functionality of Talend Jobs, using the Java language.
The tJavaRow component allows Java logic to be performed for every record within a flow.
Related Article: Java debugger and tJavaRow in Talend |
Getting ready
Open the job jo_cook_ch05_0020_tJavaRow.
How to achieve it…
//Code generated according to input schema and output schema
output_row.name = input_row.name;
output_row.dateOfBirth = input_row.dateOfBirth;
output_row.age = input_row.timestamp;
output_row.cleanedName = input_row.age;
output_row.rowCount = input_row.age;
if (input_row.name.startsWith("J ")) {
output_row.cleanedName =
StringHandling.EREPLACE(input_row.name, "J ", "James ");
}
if (input_row.name.startsWith("Jo ")) {
output_row.cleanedName =
StringHandling.EREPLACE(input_row.name, "Jo ", "Joanne ");
}
output_row.rowCount=Numeric.sequence("s1",1,1);
output_row.rowCount=Numeric.sequence("s1",1,1);
How it works…
The tJavaRow component is much like a 1 input to 1 output tMap, in that, input columns can be ignored and new columns can be added to the output.
Once the output fields have been defined, the Generate code button will create a Java mapping for every output field. If the names are the same, then it will map correctly. If input fields are not found or are named differently, then it will automatically map the field in the same position in the input or the last known input field, so be careful when using this option if you have removed fields. In some cases, it is best to propagate all fields, generate the mappings and then remove unwanted fields and mappings.
Tip
Also, be aware that the Generate Code option will remove all code in the window. If you have code that you wish to keep, then ensure that you copy it into a text editor before regenerating the code.
As you can also see from the code that was added, it is possible to use Talend’s own functions (StringHandling.EREPLACE, Numeric.sequence) in the Java components along with any other normal Java syntax, like the if statement and startsWith String method.
Talend has a rich set of functions and libraries available within its suite. However, you may want to use libraries provided by data vendor or API vendor e.g. if you want to fetch data from Google Adwords then you may want to include or import the library/jar files provided by Google into Talend.
Occasionally, during development, it is necessary (or simpler) to make use of Java classes that aren’t already included within Talend. These may be pre-existing Java code such as financial calculations or open-source libraries, which are provided by The Apache Software Foundation.
In this example, we will make use of a simple Java class ExternalValidations and its ExternalValidateCustomerName method. This class performs the following simple validation:
if (customerName.startsWith("J ")) {
return customerName.replace("J ", "James ");
} else {
if (customerName.startsWith("Jo ")) {
return customerName.replace("Jo ", "Joanne "); } else {
return customerName;
}
}
Getting ready
Open job jo_cook_ch05_0050_externalClasses.
How to do it…
import talendExternalClass.ExternalValidations;
output_row.validatedName =ExternalValidations.ExternalValidateCustomerName(input_row.name);
Explore TALEND Sample Resumes! Download & Edit, Get Noticed by Top Employers! |
Note
If you get an error when running this job, then it is possible because the new class has not been set up as a dependency automatically.
How it works…
The code routine external validations is a dummy routine used to attach the external jar file and make it available for all jobs in the project.
In order to use the classes in the JAR file, it is necessary to add an import statement within the tJavaRow so that the code knows where to find the methods.
There’s more…
An alternate method of achieving this for just a single job is to use the tLibraryLoad components at the start of the job to define the location of the external libraries and the JAR files required.
Name | Dates | |
---|---|---|
Core Java Training | Nov 09 to Nov 24 | View Details |
Core Java Training | Nov 12 to Nov 27 | View Details |
Core Java Training | Nov 16 to Dec 01 | View Details |
Core Java Training | Nov 19 to Dec 04 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.