Managing Talend Context Variables
When developing jobs in Talend, it’s sometimes necessary to run them on different environments. For other business cases, you need to pass values between multiple sub-jobs in a project. To solve this kind of issues, Talend introduced the notion of “contexts”.
This segment contains exercises that illustrate some of the methods for managing context variables within projects and jobs.
The use of context variables is a fundamental requirement for creating production quality Talend applications.
To start using contexts in Talend, you have two possible scenario’s:
1) you can create a new context group and its corresponding context variables manually, or
2) you can export an existing connection as a context.
For code to be of production quality, it must be transportable between environments. This means that, when we move code from the development environment to the test environment, it should execute properly even if we are using different file paths, file names, database names, database user IDs, and so on.
Context variables are parameters that Talend uses, that can be set to different values in different environments.
Assuming that the Talend code has been built to use these parameters and they have been set correctly for each environment, a job will execute happily using one set of resources in development, and a completely different set of resources in the test.
Common values in contexts
Another use of context variables is to define values that are commonly used within a project, such as data inbox directory, staging area directory, and constants.
Passing command line parameters
Context variables are also used to pass parameters to a job, either via the command line or from a calling a (parent) job.
Setting context variables in the code
The ability to manipulate context variables within the code is covered in the Setting context variables and globalMap variables using tJava recipe in Using Java in Talend.
Usage of Context
Talend provides a means for defining multiple different sets of values for the parameters in a group, one for each environment (or context).
Open the jo_cook_ch06_0010_addContextGroup job, and open the context group cookbookGeneral.
How to achieve it…
The steps for adding contexts to a context group are as follows:
In this example, we’ll go over exporting an existing Oracle connection as a context.
Double click an existing database connection to edit it and click Next. Click Export as context
NOTE There are some connections that don’t allow you to export them as a context. In that case you’ll have to create the context group and its variables manually, add the group/variables to your job, and use the variables in the properties of the components of your job.
After you’ve clicked the Export as context button you’ll see the Create/Edit context group screen. Enter a name, purpose and description and click Next.
Now you’ll see all the context variables that belong to this context group. Notice that Talend has already created all the context variables that are needed for the HR connection. If you want to change their names you can simply click them and they become editable.
Click the Values as table tab.
In the Values as table tab you can edit the values of the context variables by simply clicking the value and changing it. To add a new context, click the context symbol in the upper right corner.
The window that pops up is used to manage contexts. To create a new context, click New, enter the name of the context, in our example Production, and click Ok. To rename the Default context, select it, click Edit, enter Development and click Ok. When you’re done editing, click Ok.
After the window closes, you’ll see that an extra column appeared. Enter the connection data of the production environment in the Production column and click Finish.
In the connection window it’s possible to check the connection again, but this time you’ll be prompted which connection you want to check.
Verify that both the connections work and click Finish.
Now that we’ve exported the connection as a context, it’s possible to use it in a job. Create a new job, use the connection that has been exported as a context and connect it to a tLogRow component. Your job should look something like this :
When using a connection that has been exported as a context in a job, you have to include the context variables in order for your job to be able to run. Go to the context tab and click the context button in the bottom left.
NOTE When using one of the newer versions, Talend proposes to add missing context variables whenever you try to run a job, because of this you don’t need to add them manually as described in this example.
Select the context group that contains the context variables, in our case the HR context group.
Select the contexts you want to include and click OK
NOTE A context group can also be added to a job by simply selecting the context from the repository, dragging it towards the context tab of the job, and dropping it there.
Once you’ve added the context group to the job, it’s possible to run the job for both the development and production environment by selecting the context in the dropdown menu of the Run tab.
Turning implicit context loading on and off in a job
Job parameterization based on context variables enables you to orchestrate and execute your Jobs in different contexts or environments. You can define the values of your context variables when creating them, or load your context parameters dynamically, either explicitly or implicitly, when your Jobs are executed.
Open the jo_cook_ch06_0040_turnOffImplicit job.
How to do it…
The steps for turning implicit context load on and off in a job are as follows:
- Open the Job
- Select the Extra
- Uncheck Use Project Settings.
- Uncheck Implicit tContextLoad. Your Job tab should look like the one shown as follows:
- Run the job, and you will see that the initial context load is no longer performed.
How it works…
Talend allows the option to perform an implicit tContextLoad to be turned off for individual jobs within a project.
Setting the context file location in the operating system
The initial context load method and the tContextLoad methods do suffer from having to use a predetermined file location.
Note that this exercise is demonstrated on Microsoft Windows 7; however, it is possible to set and use global environment variables in Talend in any version of Windows, Linux, or Mac OS.
Copy the systemValueContext.txt file from the cookbook directory/chapter6 to C:\TalendContextDirectory.
How to achieve it…
The steps for setting the context file location in the operating system are as follows:
- Run the job, and you will notice that the value of the context variable is set to In the job.
- Go to Start | Control Panel | System and Security.
- Select System, then click the right-hand side Advanced system settings.
- Click on the Environment Variables
- Under the System variables, click on New.
- Enter Variable name as TALEND_CONTEXT_DIRECTORY.
- Enter value as C:\TalendContextDirectory\.
- Click on OK to save.
- Restart Talend Open Studio.
- Open the jo_cook_ch06_0050_systemVariableContext
- Open tContextLoad and change the file location to System.getenv(“TALEND_CONTEXT_DIRECTORY”)+ “systemValueContext.txt”.
- Run the job, and you will see that the context variable is now displaying context file in Talend directory.
How it works…
As with most fields in Talend, the text can be replaced with a variable or a snippet of Java code, and tContextLoad is no different.
By replacing the cookbook context variable with a command to read the system variable, we are able to redirect the component to pick up the data from a directory that can be altered at runtime.
Variable not present
Note that we had to stop and start Talend Studio to enable the new variable to be picked up. This is because the variable is included as a part of the Eclipse build, so it was not recognized until we restarted the job.
Implicit context load
You can configure the Implicit Context Load feature either in Project Settings so that it can be used across Jobs within the project, or in the Job view for a particular Job.
Explicit control of Context variables is good, but it is still invasive. It can be made less invasive by using the Implicit Context Load feature. This will automatically load the context from a file or a database. In the screenshot below it is loaded from a database based on a query condition. Note that the query condition itself uses a Context Variable. Since the Job has not yet started this context id must be passed via traditional means via a properties file or via the TAC. But it need contain only a single variable and the rest of the variables will be looked up in the database.
In the example above, we are using a user-defined table called talend_context. The only requirement on the table is that it has two fields called key and value. These will be returned in the result set and loaded into the job Context. Other fields can also be part of the table and in this case we have used the job_instance field to scope the query.
CREATE TABLE `talend_context` ( `idtalend_context` int(11) NOT NULL AUTO_INCREMENT, `project` varchar(45) DEFAULT NULL, `job` varchar(45) DEFAULT NULL, `job_instance` varchar(45) DEFAULT NULL, `customer` varchar(45) DEFAULT NULL, `key` varchar(45) DEFAULT NULL, `value` varchar(45) DEFAULT NULL, PRIMARY KEY (`idtalend_context`), UNIQUE KEY `talend_context_key` (`project`,`job`,`job_instance`,`key`), KEY `talend_context_customer` (`customer`) ) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=utf8;
Note that in our example the context.context_id field is also a context variable and is coming from the default context properties. If the job itself is triggered through the TAC API (see below) then the context_id can be supplied dynamically by the TAC API client.