Home  >  Blog  >   Talend

Checking a Column against a list and lookup in Talend

Rating: 5
  
 
5934
  1. Share:

Talend Open Studio for Data Quality enables you to analyze the content of one or multiple columns and execute the created analyses using the Java or the SQL engine. This type of analysis provides statistics about the values within each column.

If you would like to Enrich your career with a Talend certified professional, then visit Mindmajix - A Global online training platform: “Talend Online Course” . This course will help you to achieve excellence in this domain.

Checking a Column Against a List of Allowed Values

Often it is necessary to ensure that a column contains only values as defined in a list. This recipe shows how this can be achieved using a tMap expression.

Getting ready

Open the job jo_cook_ch03_0040 _tMapValuesInList. You will notice that the job is very similar to  REJECTING ROWS USING TMAP.

How to accomplish it…

1. Open tMap and click the expression builder button (…), and add the test criteria, as shown in the following screenshot:

How to accomplish

2. Run the job and you should see that one of the rows is rejected.

How it works…

The tMap conditions are the same as Java conditions, so the symbol || (pipe pipe) is a logical OR. Thus, the condition checks for the value being UK or USA or France.

There’s more…

This method is fine if the list is quite small and isn’t liable to change. If the list is too large or subjected to frequent changes, then the code will be hard to maintain and/or will need to be changed often, which requires re-testing of the code.

Frequently Asked TALEND Interview Questions & Answers

Checking a Column Against a Lookup

Another method for VALIDATING A COLUMN is to refer to a lookup containing a list of allowed values that can be stored in any format (file, table, XML for example).

Getting ready

Open the job jo_cook_ch03_0050_tMapValuesInLookup. You will see that there are two inputs to the tMap: customer and country.

How to do it…

  • Open tMap, and drag the field countryOfBirth from the customer input to the countryName field in the country input. This will create a key link, as shown in the following screenshot:

Customer

  • Click the button tMap settings and set the value for Join Model to Inner Join.

tMap Settings

  • In the reject output, click on the button for tMap settings, and set the value for Catch lookup inner join reject to true.

Catch lookup inner join reject

  • Run the job, and you will see that three of the records have been rejected.

  • Re-open the tMap and change the key on the country to StringHandling.UPCASE(customer.countryOfBirth)

  • Re-run the job, and you will see that now only one record has been rejected.

Checkout Talend Tutorial

How it works…

The tMap is performing an inner join between the customer data and the country data using the country name as the key, so any rows that do not join have an invalid countryOfBirth.
When a match is found, the record is passed to the valid rows output.

If no match is found, then the customer record is passed to the invalid output, which is set up to catch any row from the main flow that does not fulfill the inner join criteria.

MindMajix Youtube Channel

On the first execution of the job, the values being checked were not in upper case, so only ‘USA’ matched. On the second execution, after the customer countries had been converted to upper case, three of the records matched.

Explore TALEND Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!
Join our newsletter
inbox

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
NameDates
Talend TrainingMar 23 to Apr 07View Details
Talend TrainingMar 26 to Apr 10View Details
Talend TrainingMar 30 to Apr 14View Details
Talend TrainingApr 02 to Apr 17View Details
Last updated: 03 Apr 2023
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read more