Talend – Administering Files

Moving, copying, renaming, and deleting files and folders

As well as reading from and writing to files, Talend has a set of components that allow developers to perform file functions without the need to call native operating system commands. This recipe shows the basic file management components.

Getting ready

Open the job jo_cook_ch08_0100_basicFileCommands.

How to achieve it…

In the following recipes, it is worth noting that Talend uses the Linux style forward slash (/) in the file paths, as opposed to the Windows backslash (\).

Copying a file to another directory

  • Drag tFileCopy to the job.
  • Set the file name to be
  • Set the output directory to be
  • Run the job, and you will see that the new file has been created: This is a simple copy.

Copying files to a different name

  • Open tFileCopy, tick the Rename box, and then add a Destination filename of
  • Run the job, and you will see that there is now a renamed copy of the file.

Renaming a file

  • Open tFileCopy, and change the Input filename to be
  • You will see that the input file is in the same directory as the Destination directory.
  • Change the Destination filename to txt.
  • Click the box Remove source file.
  • Run the job, and you will see that the original file has been renamed.

Moving a file

  • This is the same as the previous, but click the box Remove source file.

Deleting a file

  • To delete a file, simply add the file path to the tFileDelete component.

How it works…

As you see, the tFileCopy is used to copy, move, and rename files, depending upon the options selected.

The tFileDelete component is simply used to delete files.

There’s more…

You should have noticed that the tFileDelete and tFileCopy components allow us to tick boxes to copy and delete directories as required. It does go without saying that the utmost care must be taken when deleting files, and even more especially when deleting directories using Talend.

Capturing file information

Another useful Talend feature is the ability to capture information about a file for use within downstream processing, most probably to perform validation prior to processing.

Getting ready

Open the jo_cook_ch08_0110_fileInformation job.

How to accomplish it…

The steps for capturing file information are as follows:

  • Drag a tFileProperties component from the right-hand panel.
  • Open tFileProperties, and set the file name to
  • Drag tFlowToIterate to the canvas, and link the row from tFileProperties to it. Name the flow properties.
  • Drag tFileRowCount to the canvas and set the filename to match the tFileProperties component.
  • Add onSubjobOk from tFileProperties to tFileRowCount, and then to tFixedFlowInput, so that your job looks like the one shown as follows:


  • Open tFixedFlowInput.
  • Add ((Long)globalMap.get(“properties.size”)) to the field fileSize.
  • Add ((Integer)globalMap.get(“tFileRowCount_1_COUNT”)) to the field numberOfRows.
  • Your tFixedFlowInput should look like the one as follows:

administering files

  • Run the job, and you will see the file information in the console.

How it works…

The tFileProperties component captures file information and passes the data in a row to the next component. The tFlowToIterate component is used as a shorthand method for adding the file information to globalMap.

The tFileRowCount component counts the number of rows in a file, and presents the count as a globalMap variable.

The final sub job shows the data held in globalMap being used in a process flow.

There’s more…

The final sub job, simply prints out some of the information; however, a good, real-life example is to use the file size from the properties to check against the file size written in a file trailer record or a validation file. This would ensure that a file transmitted from, a third party application, for example, is received in its entirety before it is processed by the receiving application.


One field in tFileProperties can be difficult to use; the file creation datetime, which is a complex string format of a date. If you need to read this into a date column, then use the following date pattern:

TalendDate.parseDateLocale("EEE MMM dd HH:mm:ss z yyyy",input_row.mtime_string,"EN") 

where EN is the locale that you may need to change.

Creating and writing files depending on the input data

Sometimes it is required that multiple files are written from a single data source where the file name is dependent upon the data held within the row. This recipe shows how this can be achieved.

Getting ready

Open the jo_cook_ch08_0140_filesFromInputData job.

How to accomplish it…

The steps for creating and writing files depending on the input data are as follows:

  • Run the job, and you will see that the file txt has been created and populated with six rows.
  • Open the tJavaRow component, and you will see that the move of data from input to output has already been performed.
  • Add in the following code after the generated code:
// test for change of input_row.key
if (Numeric.sequence(input_row.key, 1, 1) == 1 ) { outtFileOutputDelimited_1.flush();
if this is the first record then do not flush and close - do not want to create dummy.txt
otherwise if sequence > 1 then we will close the previousfile
if(Numeric.sequence("all", 1, 1) !=1 ) {
// build the new file name fileName_tFileOutputDelimited_1 =
// create new writer for the new filename. Talend uses this for writing the record
outtFileOutputDelimited_1 = new java.io.BufferedWriter( new java.io.OutputStreamWriter(
new        java.io.FileOutputStream( fileName_tFileOutputDelimited_1, false), "ISO-8859-15"));
  • Run the job. You will see that besides the dummy file, there are three additional files: txt containing the records with key a, b.txt containing records with key b, and c.txt containing rows with key c.

How it works…

The code in tJavaRow makes use of the fact that Talend code is a series of loops within loops. Because the tJavaRow loop is within the tFileOutputDelimited loop in the generated Java code, we can change variables within the inner loop, which will affect the processing within the outer loop.

The variable that we will change is the writer that Talend uses for the tFileOutputDelimited component.

tJavaRow code explained

The Numeric.sequence command uses input_row.key as the name, thus, causing a new sequence to be created whenever the key changes. Thus, by testing the sequence as 1, we know that the key has changed.

Once we know that the key changed, we can then close the previous file.

Then we create a new file name consisting of the output directory plus the input_row.key suffixed with .txt. Thus, if the key is changed, we create a file named a.txt.

The next statement then creates a new writer for the tFileOutputDelimited component and Talend will use this writer when writing to the output.



0 Responses on Talend – Administering Files"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.