Function tFilterRow filters input rows by setting conditions on the selected columns.
Purpose tFilterRow helps parameterizing filters on the source data.
Filtering and searching a list of names
The following scenario is a Java Job that uses a simple condition and a regular expression to filter a list of records. This scenario will output two tables: the first will list all Italian records where first names are shorter than six characters; the second will list all rejected records. An error message for each rejected record will be displayed in the same table to explain why such a record has been rejected.
In the Value column, you must type in your values between double quotes for all data types, except for the Integer type, which does not need quotes.
Thus, the first table lists records that have Italian names made up of less than six characters and the second table lists all records that do not match the filter condition “rejected record”. Each rejected record has a corresponding error message that explains the reason of rejection.
Splitting an input row into multiple outputs based on input conditions
Often, it is required to filter the input data into multiple outputs depending upon given criteria, for instance, splitting customer data by region, as in this example, or by team. Another very common example is to split the input data into validated records and records that have been rejected due to having failed a quality check (see Checking a column against a list of allowed values, in VALIDATING DATA for examples of using tMap to filter invalid rows).
This recipe shows how the tMap output Expression filters are used to perform filtering of the nature described precedingly.
Open the job jo_cook_ch04_0060_multipleOutputs.
How to achieve it…
How it works…
tMap will pass an input row to the output from the top of the output table list downwards, depending upon their settings.
tMap will only pass data to an output if:
It is sometimes easy to think of this list as a set of if-then-else criteria.
It is recommended that lists of outputs be ordered like if-then-else to make understanding easier. It is also recommended that multiple tMaps be used in the scenario where many outputs are created, depending upon complex conditions. It is not that tMap cannot handle a high level of complexity, rather the impact of changes may be difficult to calculate if there are many inputs, outputs, joins, and conditions.
In this recipe, we have multiple copies of the input being created using input criteria. It is worth noting that the outputs do not need to be copies of each other.
It is also worth noting that if no criteria is specified for any output, then tMap will copy every input row to every output. What’s more is that each of the output can be of a different format and have different rules for the same input row. In this instance, tMap becomes a means of creating multiple different views of the same output data.
What is also possible is that multiple outputs can be specified with catch output reject specified. This means that multiple views of rejected data can also be created.
Get Updates on Tech posts, Interview & Certification questions and training schedules