Joining data using tMap
tMap is more powerful in terms of FUNCTIONALITY.
1. tMap can have many outputs links.
2. With tMap we can use the expression on the columns while providing the joining condition.
3. In tMap we have option to store the intermediate data in the disc.
4. In tMap, we can enable the option to reload the look-up for every record.
5. tMap supports more types of join model, includes unique join, first join and all join.
6. tMap allows you to link multiple look-up flows into it, and supports to load multiple look-up flows parallel.
7. tMap supports ‘die on error’ option.
Hence tMap is quite powerful component. Being a powerful component, it generates more code and it may take more space and time to load in the memory.
Open the job jo_cook_ch04_0070_tMapJoin.
How to do it…
Close tMap and run the job.
How it works…
tMap allows for different join types to be defined using expressions as keys. In this example, we used a variable from the main flow, plus a constant (“Card”) as our join keys.
The first execution of the job performed a left outer join, so all input records are output and non-matched fields are set to null (or default value if they are Java primitives). In addition, the first execution also specified to use only a unique match, thus printing out only one row per customer.
The second execution, however, specified that we wanted to do an inner join with all matches, so the output contained all orders where the customer paid with a credit card.
In the second execution, we also defined a second output that caught all the rows from the main flow that did not have any matches to the lookup.
This recipe illustrates the main features of joining using tMap, but only joins one table to another. It is also possible to join the same table with many other different keys from many lookups in a single tMap.
The next two recipes will show some examples of this.
The lookups are processed slightly earlier than the main flow. Due to the small volumes of data in this recipe, it isn’t apparent, but if you replace the file for tFileInputDelimited_2 with chapter04_jo_0080_orderData_large.csv, then this will become very apparent.
You will see that tMap loads the lookup data into memory tables at the start of the job before it begins processing the main data flow.
For batch data integration jobs, this is an efficient method, since it reduces the lookup time per transaction on the main flow, however, in the recipe USING RELOAD AT EACH ROW to process real-time/near real-time data, we will see how this method is not appropriate for small volume, real-time or near real-time data.
Also, be aware that in order to process large lookups, you will need to ensure that you have enough memory available and allocated to hold all the lookup data. If not, then the process will return out of memory errors.
Hierarchical joins using tMap
tMap has another level of joining capability, in that it can join together data in a hierarchical fashion. This simple example shows how easily this can be achieved using tMap.
Open the job jo_cook_ch04_0080_hierarchicaltMapJoin.
How to accomplish it…
How it works…
This job works on the hierarchy that exists between customer, order, and order item. A customer has many orders and an order has many order items.
The key for orders is the customer, and the key for order items is order. Thus, to get all the order items for a customer, it is necessary to first find the keys for all the orders, and then find all the order items that match the order keys.
As you can see, tMap allows this relationship to be defined easily, simply by dragging the relevant parent key to the child structure.
Get Updates on Tech posts, Interview & Certification questions and training schedules