Often, with batch processes, it is required that multiple FILES are processed by the same job in a single tranche. This example shows how this can be achieved by merging a group of input files into a single output.
Open the jo_cook_ch08 _0120_multipleFiles job. You will notice that it is currently reading a single file to a temporary file, and then copying the temporary file to a permanent output.
How to do it…
The steps for processing multiple files at once are as follows:
If you would like to Enrich your career with a Talend certified professional, then visit Mindmajix - A Global online training platform: “Talend Online Course” . This course will help you to achieve excellence in this domain.
System.out.println("Processing file: "+ ((String)globalMap.get("tFileList_1_CURRENT_FILE"))).
How it works…
This job merges all files in a directory into a temporary file ready for processing as a single entity; in this case, renaming the temporary file to a permanent output file name.
The tFileList component is an iterator that is triggered by each file found that fits the specified mask.
So as each file is found, the file details are stored in globalMap, and then all linked components and sub jobs will be processed until no more files are found.
As you can see from the job, the tFileInputDelimited component reads from the file specified in globalMap by tFileList, and tFileOutputDelimited writes to the globalMap variable specified by tCreateTemporaryFile.
Once all files have been read and processed, tFileList is then complete, and the onSubjobOk link will be triggered, thus copying the temporary file into a final permanent merged file.
-------- Also Read: Talend Tutorial --------
In this job, we have only one sub job that is executed as part of the Iterate, but it is probably more common to have many. In a traditional programming language, this would mean that all the processing linked to the Iterate would be in a programming loop.
It is also possible to have further iterations below the first one, for instance, if you are navigating your way down a set of directories to find input files for processing.
The tJava component named dummy is just that. It performs no logic and is present in the code just to make it more readable. This is because it allows the processing for each iteration to sit in individual sub jobs as if they are within a normal, atomic job that processes just one file.
Some organizations prefer to use a companion (control/validation) file containing file information instead of storing the information in the file header or trailer. This means that the detail file is much simpler to process because it is a normal flat file.
In this recipe, the control file has the same name as the detail file; however, it is suffixed with .ctrl rather than .txt. This recipe shows how the control file is processed.
------- Related Page: Organizing Talend Files ------
Open the jo_cook_ch08_0130 _controlFile job. You will see that tFileList_1 is looking for files with the mask of chapter08_jo_0130_customerData*.txt. There are two of these in the directory.
How to achieve it…
The steps for processing control/validation files
System.out.println("Found control file: "+ ((String)globalMap.get("tFileList_2_CURRENT_FILE")));.
How it works…
The first tFileList looks for files that fit the mask “chapter08_jo_0130_customerData*.txt”, of which there are three.
For each .txt file, it finds the file that fits the mask, and then performs another tFileList. This time, however, the mask is the actual file name, but with .txt replaced with .ctrl. This has the effect of searching for a control file that has exactly the same name as the text file.
Once a match is found, then we have both file names in globalMap together, and the file details can be validated and processed by whatever means within the main processing section.
Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!
|Talend Training||May 23 to Jun 07|
|Talend Training||May 28 to Jun 12|
|Talend Training||May 30 to Jun 14|
|Talend Training||Jun 04 to Jun 19|
Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.
Copyright © 2013 - 2022 MindMajix Technologies