For enterprises in today's Big Data and cloud-centric environment, harnessing their corporate information is critical. An open source software platform called Talend makes it easy to transform this data into valuable business insights. Talend's easy, sophisticated, and random test creation process is described in full here.
This recipe shows how tRowGenerator allows dummy data to be created for test purposes.
Table of Contents
Getting ready
How to do it…
The steps for creating simple test data using tRowGenerator are as follows:
Exit the tRowGenerator component, and run the job.
If you would like to Enrich your career with a Talend certified professional, then visit Mindmajix - A Global online training platform: “Talend Course” . This course will help you to achieve excellence in this domain.
How it works…
Talend provides a set of random generators for different field types to enable test data to be created very easily. So, as you can see, we are using a sequence to create a sequential customer key, random first names and last names, and a random date of birth.
There’s more…
If you have created any custom code routines, you will see that these also appear in the list, along with the Talend-provided data generation routines. It is, therefore, preferable to create your own routines to generate data, such as UK postcodes in the same way as Talend does, and make them available to tRowGenerator.
This recipe shows how a more complex set of TEST DATA can be created. In this example, we will build a set of CSV data ready to be loaded into a database which has the following structure:
Getting ready
Frequently Asked TALEND Interview Questions & Answers
How to do it…
The steps for creating complex test data using tRowGenerator, tFlowTolterate, tMap, and sequences are as follows:
context.cookbookData+"outputData/chapter10/chapter10_jo_0110_o
and tick the Append option.
How it works…
The tFlowToIterate components and the Numeric.sequence commands are the keys to this method. The tFlowToIterate component allows us to cascade the key information down from the highest level (customer) to the lowest level (order item), and the uniquely named sequences enable us to generate unique keys for each type (customer, order, order item).
The Numeric.random commands are also useful, in that, they make the data “interesting”. It allows us to generate a random number of orders per customer and items per order.
Note that also the use of the lookup and the random function again to assign products to each item randomly.Also, note the deletes at the beginning. They have been set to delete the created files prior to execution, and you may also notice that they are set to never fail.
The former is due to the fact that we are appending to the order and order item files; failure to do this will result in continually growing files. The latter ensures that when we first run the job, it does not fail because the files aren’t found in the directory, which they will not be.
There’s more…
It is possible to create varied, but referentially accurate data, which will provide a platform for testing of Talend jobs.
It is also possible to use these as a basis for generating Excel files that can then be hand-cranked with additional data to make the tests even more realistic.
Tip
Warning: This method uses random values to create data, so will probably never create the same data twice. Once you are happy with a test data set, then copy it to another directory to avoid it being overwritten. If you want repeatable tests, then use actual numbers rather than random numbers.
Lookup plugins allow access of data from outside sources. These values are then made available using the standard templating system and are typically used to load variables or templates with information from those systems.
This simple technique shows how we can randomly assign values using lookups.
Getting ready
Open the jo_cook_ch10_0120_randomTestDataLookups job.
How to do it…
The steps for creating random test data using lookups are as follows:
How it works…
As you will see from the output, the job will add a random product ID and product description to each order item row.
The match model of First Match ensures that only one match is returned for each order item line.
The Numeric.random(1,15) function returns a value from 1 through to 15, which is the number of products in the products list CSV file.
Thus the process will generate a random number for each order line and then use this random number as a key to lookup against the product list and assign a product to the order line.
Sometimes, you will need to extract data from specific Excel cells rather than all of the data in the file.
Procedure
Create an example Job
Create a Job called ExtractSpecificCellDemo. The detailed component settings are as follows:
The tJavaRow code:
int seq= Numeric.sequence("s1",1,1);
if (seq == 4) {
context.d4 = input_row.c4;
}
if (seq == 7) {
context.b7 = input_row.c2;
}
In this Job, two context variables are defined to store the data extracted from the specific cells.
Execute the Job
Execute the Job. The following text is output to the console:
Starting Job ExtractSpecificCellDemo at 15:47 02/02/2012.
[statistics] connecting to socket on port 3733
[statistics] connected
.--+--.
|tLogRow_1|
|=-+-=|
|d4|b7|
|=-+-=|
|D4|B7|
'--+--'
[statistics] disconnected
Job ExtractSpecificCellDemo ended at 15:47 02/02/2012. [exit code=0]
Name | Dates | |
---|---|---|
Talend Training | Sep 10 to Sep 25 | View Details |
Talend Training | Sep 14 to Sep 29 | View Details |
Talend Training | Sep 17 to Oct 02 | View Details |
Talend Training | Sep 21 to Oct 06 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.