Creating Simple, Complex and Random Test in Talend

For enterprises in today's Big Data and cloud-centric environment, harnessing their corporate information is critical. An open source software platform called Talend makes it easy to transform this data into valuable business insights. Talend's easy, sophisticated, and random test creation process is described in full here.

Creating simple test data using tRowGenerator

This recipe shows how tRowGenerator allows dummy data to be created for test purposes.

Getting ready

  • Open the jo_cook_ch10_0100_tRowGenerator
  • Open the tRowGenerator

How to do it…

The steps for creating simple test data using tRowGenerator are as follows:

  • Click on the Functions cell for customerId and select sequence.
  • Click on the Functions cell for firstName and select TalendDataGenerator.getFirstName.
  • Click on the Functions cell for lastName and select TalendDataGenerator.getLastName.
  • Click on the Functions cell for DOB and select getRandomDate. Your tRowGenerator should be as shown in the following screenshot:

Exit the tRowGenerator component, and run the job.

If you would like to Enrich your career with a Talend certified professional, then visit Mindmajix - A Global online training platform: “Talend Course” . This course will help you to achieve excellence in this domain.

How it works…

Talend provides a set of random generators for different field types to enable test data to be created very easily. So, as you can see, we are using a sequence to create a sequential customer key, random first names and last names, and a random date of birth.

There’s more…

If you have created any custom code routines, you will see that these also appear in the list, along with the Talend-provided data generation routines. It is, therefore, preferable to create your own routines to generate data, such as UK postcodes in the same way as Talend does, and make them available to tRowGenerator.

Creating Complex Test Data using tRowGenerator, tFlowToIterate, tMap, and sequences

This recipe shows how a more complex set of TEST DATA can be created. In this example, we will build a set of CSV data ready to be loaded into a database which has the following structure:

  • Customer has 1 or more orders
  • Order has 1 or more order items

Getting ready

  • Open the jo_cook_ch10_0110_complexTestData
  • You will see a section of code that has been deactivated. Do not activate this code until later.
  • Run the job, and you will see that the customer file is created.

Frequently Asked TALEND Interview Questions & Answers

How to do it…

The steps for creating complex test data using tRowGenerator, tFlowTolterate, tMap, and sequences are as follows:

  • Activate components tFixedFlowInput_2, tMap_2, and tFileOutputDelimited_2. These are exact copies of the customer create.
  • Change these newly activated components detailed as follows:
  • Open tFixedFlowInput_2 and change Number of rows to Numeric.random(1,5).
  • Open tMap_2. Change the name of the variable to orderId, and the name of the sequence to order.
  • Add a new column to the output named orderId, and copy the variable var.orderId to it.
  • Delete the customerName output column.
  • In the customer expression field, press CTRL + SPACE, and select the tFlowToIterate value for customerId. It will populate as ((Integer)globalMap.get(“row3.customerId”)).
  • Change the name of the file in tFileOutputDelimited to

MindMajix Youtube Channel

context.cookbookData+"outputData/chapter10/chapter10_jo_0110_o

and tick the Append option.

  • Run the job. You will see that the order file has been created with between 1 and 5 orders for each customer.
  • Activate the rest of the components and run the job. You will see that the order item file has been created with between 1 and 5 items per order.

How it works…

The tFlowToIterate components and the Numeric.sequence commands are the keys to this method. The tFlowToIterate component allows us to cascade the key information down from the highest level (customer) to the lowest level (order item), and the uniquely named sequences enable us to generate unique keys for each type (customer, order, order item).

The Numeric.random commands are also useful, in that, they make the data “interesting”. It allows us to generate a random number of orders per customer and items per order.
Note that also the use of the lookup and the random function again to assign products to each item randomly.Also, note the deletes at the beginning. They have been set to delete the created files prior to execution, and you may also notice that they are set to never fail.

The former is due to the fact that we are appending to the order and order item files; failure to do this will result in continually growing files. The latter ensures that when we first run the job, it does not fail because the files aren’t found in the directory, which they will not be.

There’s more…

It is possible to create varied, but referentially accurate data, which will provide a platform for testing of Talend jobs.
It is also possible to use these as a basis for generating Excel files that can then be hand-cranked with additional data to make the tests even more realistic.

Tip

Warning: This method uses random values to create data, so will probably never create the same data twice. Once you are happy with a test data set, then copy it to another directory to avoid it being overwritten. If you want repeatable tests, then use actual numbers rather than random numbers.

Creating Random Test Data using Lookups

Lookup plugins allow access of data from outside sources. These values are then made available using the standard templating system and are typically used to load variables or templates with information from those systems.

This simple technique shows how we can randomly assign values using lookups.

Getting ready

Open the jo_cook_ch10_0120_randomTestDataLookups job.

How to do it…

The steps for creating random test data using lookups are as follows:

  • Open tMap.
  • Open the tMap settings for the productData input flow.
  • Change the Match Model to First Match.
  • For the key for productData, add the code: Numeric.random(1,15)
  • Drag all columns from both inputs to the output.
  • Your tMap should now look like this:
  • Exit tMap and run the job.

How it works…

As you will see from the output, the job will add a random product ID and product description to each order item row.
The match model of First Match ensures that only one match is returned for each order item line.
The Numeric.random(1,15) function returns a value from 1 through to 15, which is the number of products in the products list CSV file.
Thus the process will generate a random number for each order line and then use this random number as a key to lookup against the product list and assign a product to the order line.

Checkout Talend Tutorial

Extracting Data from Specific Excel cells

Sometimes, you will need to extract data from specific Excel cells rather than all of the data in the file.

Procedure

  • Use an Excel file, as follows
  • This example will extract data from cells D4 and B7.

Extracting data from specific Excel cells

Explore TALEND Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

Create an example Job

Create a Job called ExtractSpecificCellDemo. The detailed component settings are as follows:

ExtractSpecificCellDemo

Schema of file Input Excel

Details

Fixed Flow Input

The tJavaRow code:

int seq= Numeric.sequence("s1",1,1);
if (seq == 4) {
    context.d4 = input_row.c4;
}
if (seq == 7) {
    context.b7 = input_row.c2;
}

In this Job, two context variables are defined to store the data extracted from the specific cells.

Execute the Job

Execute the Job. The following text is output to the console:

Starting Job ExtractSpecificCellDemo at 15:47 02/02/2012.
  
[statistics] connecting to socket on port 3733
[statistics] connected
.--+--.
|tLogRow_1|
|=-+-=|
|d4|b7|
|=-+-=|
|D4|B7|
'--+--'
  
[statistics] disconnected
Job ExtractSpecificCellDemo ended at 15:47 02/02/2012. [exit code=0]
Course Schedule
NameDates
Talend TrainingSep 10 to Sep 25View Details
Talend TrainingSep 14 to Sep 29View Details
Talend TrainingSep 17 to Oct 02View Details
Talend TrainingSep 21 to Oct 06View Details
Last updated: 03 Apr 2023
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less
  1. Share: