Adding a HEADER AND TRAILER to a file
Let us look at how to add a custom header, footer to file. This is often required for file processing and validation.
Requirement: Create a delimited file with below structure.
File Name: Name of the actual File. PID : process ID. Header Columns.
File Created Date: Date & Time. Number of Records: record count.
This is our final job design.
Final Job Design
Step 1: We`ll create Header part first.
Step 2: Add tFixedFlowInput component and configure as below.
"File Name: CustomerDetails.csv" "PID :"+pid "Id,Name,City,State,Street"
Step 3: Add tFileOutputDelimited and connect with tFixedFlowInput component using Main flow, then configured as follows.
Step 4: Add tRowGenerator
Step 5: Copy Paste tFileOutputDelimited_1 component which was created in Step 3, then connect tFileOutputDelimited_2 with tRowGenerator using main flow. Configure as follows.
Step 6: Create a context variable named as “NumberOfRows”, then copy and paste tFixedFlowInput_1.
Step 7: Configure newly pasted tFixedFlowInput_2 component.
Step 8: Copy paste tFileOutputDelimited_2 and connect with tFixedFlowInput_2 using main flow. No need to do any additional configuration.
Step 9: Add tFlowMeterCatcher component and add tJavaRow and connect each other using main flow, don`t click on “Synch” button on tJavaRow. Write below code in tJavaRow component.
Step 10: Run the job, it will create files with Header part, Data part and footer part. Below is our final output.
Header and Footer Output
Reading headers and trailers using tMap
This recipe shows how to parse a file that has a header and trailer records, and a record type at the start of a line.
Open the jo_cook_ch08_0060_headTrailtMap job.
How to accomplish it…
The steps for reading headers and trailers using tMap are as follows:
sc_cook_ch8_0060_genericCustomerHeader sc_cook_ch8_0060_genericCustomerDetail sc_cook_ch8_0060_genericCustomerTrailer
How it works…
tFileInputFullRow allows us to read a row of any format into tMap. This is important, because we do not want records to be rejected due to schema errors at this stage.
The start of each row is then tested for the record type; 00, 01, or 02, the header, detail, or trailer records respectively.
The different rows are then passed to a tExtractDelimitedFields component for breaking down into the individual schema columns.
This isn’t the only method of reading files with headers and trailers, and in fact, the best Talend method would be to use the tFileInputMSDelimited component, for this example.
This method, however, is much more flexible, in which the conditions for sending in the data as an output to each of the flows does not depend upon a fixed field being present.
Reading headers and trailers with no identifiers
This recipe shows how to parse a file that has a header and trailer records, but does not have an associated record type. Instead, the header is the first record in the file, and the trailer is the last record in the file.
Open the jo_cook_ch08_0070_headTrailtMapNoType job. You will see that it is a slightly changed version of the completed job from the previous recipe; the output schemas have changed.
How to achieve it…
The steps for reading headers and trailers with no identifiers are as follows:
which is the same as our input file.
Connect an onSubJobOk trigger from the tFileRowCount component to the tFileInputDelimited.
Open the tMap, and add a new variable rowCount. Set its expression to Numeric.sequence(“rowNumber”,1,1).
Change the Filter expressions for header, detail, and trailer to those shown as follows:
Var.rowNumber == 1 Var.rowNumber != ((Integer)globalMap.get("tFileRowCount_1_COUNT")) Var.rowNumber == ((Integer)globalMap.get("tFileRowCount_1_COUNT"))
How it works…
The tFileRowCount component tells us how many rows are present in the file.
In the tMap, we use a sequence to calculate the current line number. If the line number is 1, then we have a header row. If it is equal to the row count (held in globalMap), then we have a trailer row, and all other rows are detail rows.
We then use the tExtractDelimitedFields to extract the individual delimited fields into a different schema for each of the row types.
Get Updates on Tech posts, Interview & Certification questions and training schedules