Processing Multiple and Validation Files – Talend

Processing Multiple files at once

Often, with batch processes, it is required that multiple FILES are processed by the same job in a single tranche. This example shows how this can be achieved by merging a group of input files into a single output.

Getting ready

Open the jo_cook_ch08 _0120_multipleFiles job. You will notice that it is currently reading a single file to a temporary file, and then copying the temporary file to a permanent output.

How to do it…

The steps for processing multiple files at once are as follows:

  • Add a tFileList component, open it, and set the directory to
context.cookbookData+"/chapter8".
  • Click on the + button under the Filemask box, and add the filemask
"chapter08_jo_0120_customerData_*.txt".
  • Your tFileList should look like the one shown, as follows:

Filemask

  • Move the OnSubjobOk from the tFileInputDelimited to the tFileList.
  • Add a tJava
  • Right-click on tFileList and select Row, then Iterate, and link to the tJava.
  • Right-click on the tJava and select Trigger, then OnComponentOk.
  • Link it to the tFileInputDelimited (customer)
  • Open the tFileInputDelimited component, and change the file name to
((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
  • Move the OnSubjobOk link from tFileInputDelimited (customer) to the tFileList component.
  • Your job should look like the one shown as follows:

If you would like to Enrich your career with a Talend certified professional, then visit Mindmajix - A Global online training platform: “Talend Online Course” . This course will help you to achieve excellence in this domain.

FileList component

  • Run the job, and you will see that the output file contains information from the three input files.
  • To make the job output more useful, open tJava and insert the following code
System.out.println("Processing file: "+ ((String)globalMap.get("tFileList_1_CURRENT_FILE"))).
  • Run the job again, and you will see that the console now logs the individual files as they are found.

How it works…

This job merges all files in a directory into a temporary file ready for processing as a single entity; in this case, renaming the temporary file to a permanent output file name.

The tFileList component is an iterator that is triggered by each file found that fits the specified mask.

So as each file is found, the file details are stored in globalMap, and then all linked components and sub jobs will be processed until no more files are found.

As you can see from the job, the tFileInputDelimited component reads from the file specified in globalMap by tFileList, and tFileOutputDelimited writes to the globalMap variable specified by tCreateTemporaryFile.

Once all files have been read and processed, tFileList is then complete, and the onSubjobOk link will be triggered, thus copying the temporary file into a final permanent merged file.

--------          Also Read: Talend Tutorial          --------

There’s more…

In this job, we have only one sub job that is executed as part of the Iterate, but it is probably more common to have many. In a traditional programming language, this would mean that all the processing linked to the Iterate would be in a programming loop.

It is also possible to have further iterations below the first one, for instance, if you are navigating your way down a set of directories to find input files for processing.

Tip

The tJava component named dummy is just that. It performs no logic and is present in the code just to make it more readable. This is because it allows the processing for each iteration to sit in individual sub jobs as if they are within a normal, atomic job that processes just one file.

Frequently Asked TALEND Interview Questions & Answers

Processing control/validation files

Some organizations prefer to use a companion (control/validation) file containing file information instead of storing the information in the file header or trailer. This means that the detail file is much simpler to process because it is a normal flat file.

In this recipe, the control file has the same name as the detail file; however, it is suffixed with .ctrl rather than .txt. This recipe shows how the control file is processed.

-------         Related Page: Organizing Talend Files         ------

Getting ready

Open the jo_cook_ch08_0130 _controlFile job. You will see that tFileList_1 is looking for files with the mask of chapter08_jo_0130_customerData*.txt. There are two of these in the directory.

Blog post image

How to achieve it…

The steps for processing control/validation files

  • Copy the first sub job.
  • Change the new tFileList mask to StringHandling.EREPLACE(((String)globalMap.get(“tFileList_1_CURREN
  • Open tJava_2 and change the command to
System.out.println("Found control file: "+ ((String)globalMap.get("tFileList_2_CURRENT_FILE")));.
  • Connect the first and second sub job, using OnComponentOk.
  • Repeat the same for the second and third sub-jobs.
  • Your job should now look like this:

second and third sub jobs

  • Run the job, and you will see that the main process is called once per file/control combination.

How it works…

The first tFileList looks for files that fit the mask “chapter08_jo_0130_customerData*.txt”, of which there are three.

For each .txt file, it finds the file that fits the mask, and then performs another tFileList. This time, however, the mask is the actual file name, but with .txt replaced with .ctrl. This has the effect of searching for a control file that has exactly the same name as the text file.

Once a match is found, then we have both file names in globalMap together, and the file details can be validated and processed by whatever means within the main processing section.

Explore TALEND Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!
Course Schedule
NameDates
Talend TrainingSep 14 to Sep 29View Details
Talend TrainingSep 17 to Oct 02View Details
Talend TrainingSep 21 to Oct 06View Details
Talend TrainingSep 24 to Oct 09View Details
Last updated: 03 Apr 2023
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less
  1. Share: