Processing multiple files at once
Often, with batch processes, it is required that multiple FILES are processed by the same job in a single tranche. This example shows how this can be achieved by merging a group of input files into a single output.
Open the jo_cook_ch08 _0120_multipleFiles job. You will notice that it is currently reading a single file to a temporary file, and then copying the temporary file to a permanent output.
How to do it…
The steps for processing multiple files at once are as follows:
System.out.println("Processing file: "+ ((String)globalMap.get("tFileList_1_CURRENT_FILE"))).
How it works…
This job merges all files in a directory into a temporary file ready for processing as a single entity; in this case, renaming the temporary file to a permanent output file name.
The tFileList component is an iterator that is triggered by each file found that fits the specified mask.
So as each file is found, the file details are stored in globalMap, and then all linked components and sub jobs will be processed until no more files are found.
As you can see from the job, the tFileInputDelimited component reads from the file specified in globalMap by tFileList, and tFileOutputDelimited writes to the globalMap variable specified by tCreateTemporaryFile.
Once all files have been read and processed, tFileList is then complete, and the onSubjobOk link will be triggered, thus copying the temporary file into a final permanent merged file.
In this job, we have only one sub job that is executed as part of the Iterate, but it is probably more common to have many. In a traditional programming language, this would mean that all the processing linked to the Iterate would be in a programming loop.
It is also possible to have further iterations below the first one, for instance, if you are navigating your way down a set of directories to find input files for processing.
The tJava component named dummy is just that. It performs no logic and is present in the code just to make it more readable. This is because it allows the processing for each iteration to sit in individual sub jobs as if they are within a normal, atomic job that processes just one file.
Processing control/validation files
Some organizations prefer to use a companion (control/validation) file containing file information instead of storing the information in the file header or trailer. This means that the detail file is much simpler to process, because it is a normal flat file.
In this recipe, the control file has the same name as the detail file; however, it is suffixed with .ctrl rather than .txt. This recipe shows how the control file is processed.
Related Page:: Organizing Talend Files
Open the jo_cook_ch08_0130 _controlFile job. You will see that tFileList_1 is looking for files with the mask of chapter08_jo_0130_customerData*.txt. There are two of these in the directory.
How to achieve it…
The steps for processing control/validation files
System.out.println("Found control file: "+ ((String)globalMap.get("tFileList_2_CURRENT_FILE")));.
Related Page:: Administering Files
How it works…
The first tFileList looks for files that fit the mask “chapter08_jo_0130_customerData*.txt”, of which there are three.
For each .txt file, it finds the file that fits the mask, and then performs another tFileList. This time, however, the mask is the actual file name, but with .txt replaced with .ctrl. This has the effect of searching for a control file that has exactly the same name as the text file.
Once a match is found, then we have both file names in globalMap together, and the file details can be validated and processed by whatever means within the main processing section.
Get Updates on Tech posts, Interview & Certification questions and training schedules