Creating Simple, Complex and Random Test in Talend
Creating simple test data using tRowGenerator
This recipe shows how tRowGenerator allows dummy data to be created for test purposes.
- Open the jo_cook_ch10_0100_tRowGenerator
- Open the tRowGenerator
How to do it…
The steps for creating simple test data using tRowGenerator are as follows:
- Click on the Functions cell for customerId and select sequence.
- Click on the Functions cell for firstName and select TalendDataGenerator.getFirstName.
- Click on the Functions cell for lastName and select TalendDataGenerator.getLastName.
- Click on the Functions cell for DOB and select getRandomDate. Your tRowGenerator should be as shown in the following screenshot:
- Exit the tRowGenerator component, and run the job.
How it works…
Talend provides a set of random generators for different field types to enable test data to be created very easily. So, as you can see, we are using a sequence to create a sequential customer key, random first names and last names, and a random date of birth.
If you have created any custom code routines, you will see that these also appear in the list, along with the Talend-provided data generation routines. It is therefore preferable to create your own routines to generate data, such as UK postcodes in the same way as Talend does, and make them available to tRowGenerator.
Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
This recipe shows how a more complex set of test data can be created. In this example, we will build a set of CSV data ready to be loaded into a database which has the following structure:
- Customer has 1 or more orders
- Order has 1 or more order items
- Open the jo_cook_ch10_0110_complexTestData
- You will see a section of code that has been deactivated. Do not activate this code until later.
- Run the job, and you will see that the customer file is created.
How to do it…
The steps for creating complex test data using tRowGenerator, tFlowTolterate, tMap, and sequences are as follows:
- Activate components tFixedFlowInput_2, tMap_2, and tFileOutputDelimited_2. These are exact copies of the customer create.
- Change these newly activated components detailed as follows:
- Open tFixedFlowInput_2 and change Number of rows to Numeric.random(1,5).
- Open tMap_2. Change the name of the variable to orderId, and the name of the sequence to order.
- Add a new column to the output named orderId, and copy the variable var.orderId to it.
- Delete the customerName output column.
- In the customer expression field, press CTRL + SPACE, and select the tFlowToIterate value for customerId. It will populate as ((Integer)globalMap.get(“row3.customerId”)).
- Change the name of the file in tFileOutputDelimited to
and tick the Append option.
- Run the job. You will see that the order file has been created with between 1 and 5 orders for each customer.
- Activate the rest of the components and run the job. You will see that the order item file has been created with between 1 and 5 items per order.
How it works…
The tFlowToIterate components and the Numeric.sequence commands are the key to this method. The tFlowToIterate component allows us to cascade the key information down from the highest level (customer) to the lowest level (order item), and the uniquely named sequences enable us to generate unique keys for each type (customer, order, order item).
The Numeric.random commands are also useful, in that, they make the data “interesting”. It allows us to generate a random number of orders per customer and items per order.
Note that also the use of the lookup and the random function again to assign products to each item randomly.
Also note the deletes at the beginning. They have been set to delete the created files prior to execution, and you may also notice that they are set to never fail. The former is due to the fact that we are appending to the order and order item files; failure to do this will result in continually growing files. The latter ensures that when we first run the job, it does not fail because the files aren’t found in the directory, which they will not be.
It is possible to create varied, but referentially accurate data, which will provide a platform for testing of Talend jobs.
It is also possible to use these as a basis for generating Excel files that can then be hand-cranked with additional data to make the tests even more realistic.
Warning: This method uses random values to create data, so will probably never create the same data twice. Once you are happy with a test data set, then copy it to another directory to avoid it being overwritten. If you want repeatable tests, then use actual numbers rather than random numbers.
Creating random test data using lookups
Lookup plugins allow access of data from outside sources. These values are then made available using the standard templating system, and are typically used to load variables or templates with information from those systems.
This simple technique shows how we can randomly assign values using lookups.
Open the jo_cook_ch10_0120_randomTestDataLookups job.
How to do it…
The steps for creating random test data using lookups are as follows:
- Open tMap.
- Open the tMap settings for the productData input flow.
- Change the Match Model to First Match.
- For the key for productData, add the code: Numeric.random(1,15)
- Drag all columns from both inputs to the output.
- Your tMap should now look like this:
- Exit tMap and run the job.
How it works…
As you will see from the output, the job will add a random product ID and product description to each order item row.
The match model of First Match ensures that only one match is returned for each order item line.
The Numeric.random(1,15) function returns a value from 1 through to 15, which is the number of products in the products list CSV file.
Thus the process will generate a random number for each order line and then use this random number as a key to look up against the product list and assign a product to the order line.
Extracting data from specific Excel cells
Sometimes, you will need to extract data from specific Excel cells rather than all of the data in the file.
- Use an Excel file, as follows
- This example will extract data from cells D4 and B7.
Create an example Job
Create a Job called ExtractSpecificCellDemo. The detailed component settings are as follows:
The tJavaRow code:
In this Job, two context variables are defined to store the data extracted from the specific cells.
Execute the Job
Execute the Job. The following text is output to the console: