Introduction to Talend
Talend is an open source software for Data Integration that is quickly becoming an alternative to expensive ETL tools in the BI space.
Talend has its presence in four areas especially, the Data Quality, Data Integration, Master Data Management and Application Integration.
Who can use Talend?
There are no prerequisites for learning Talend for Big Data. A professional from any background can can use it.
Before you begin
Before you begin to study the actual content of Talend, it is worth becoming familiar with some of the key concepts and best practices. The present section deals with all such elements required.
Keep code changes small and test often
When developing using Talend, as with any other development tool, it is recommended to code in short bursts and test (run) frequently.
By keeping each change small, it is much easier to find where and what has caused problems during compilation and execution.
Contexts and globalMap
context and globalMap are global areas used to store data that can be used by all components within a Talend job.
context variables are predefined prior to job execution in a context group, whereas globalMap variables are created on the fly at any point within a job.
The Context variables are user defined variables provided by Talend Open studio whose value can be changed at run time. We can provide the values of the context variables at runtime which allows jobs to be executed in different ways with different parameters.
The Context describes the user-defined parameters that are passed to your Job at runtime.
Context variables are parameters that Talend uses, that can be set to different values in different environments. Context Variables are the values that may change as you promote your Job from Development, through to Test and Production. Values may also change as your environment changes, for example, passwords may change from time to time.
Context variables can also be used for:
- To pass information into a job from the command line and/or a parent job.
- To manage the values of parameters between environments.
- To store values within a job or set of jobs.
Talend provides the globalMap Object where both Talend and you can store and retrieve data. This is a great place to create your global variables as well as retrieving important information about your executing Job.
globalMap is a very important construct within Talend, in that:
- Almost every component will write information to globalMap once it completes execution (for example NB_LINE is the number of rows processed in a component).
- Certain components, such as tFlowToIterate or tFileList, will store data in globalMap variables for use by downstream components.
- Developers can read and write to globalMap to create global variables in an ad hoc fashion. The use of global variables can often be the best way to ensure code is simple and efficient.
Difference between Context and Global Map:-
These are global areas used to store data that can be used by all components within a Talend job.
Context variables are predefined prior to job execution in a context group, whereas Global Map variables are created on the fly at any point within a job.
Java is a hugely popular and incredibly rich programming language. Talend is a Java code generator which makes use of many open source Java libraries, so this means that Talend functionality can easily be extended by integrating Java code into Talend jobs.
Other background knowledge
As a data Integrator, you will be expected to understand many technologies and how to interface with them.
Installing the software
The instructions for installing the code and scripts are detailed in the following section:
How to achieve it…
- All templates, completed code, and data are in the cookbook.zip file.
- Unzip cookbook.zip into a folder on your machine.
- Copy the directory cookbookData to a directory on your machine (we recommend C:\cookbookData or the Linux/MacOS equivalent)
- Download and install the latest version of Talend Open Studio for enterprise service bus (ESB) from www.talend.com.
- Open Talend Open Studio, and you will be prompted to create a new project.
- Name the new project cookbook.
- Open the project.
- Right mouse click on the Job Designs folder in the Repository panel, and select the option Import Items.
9. This opens the import wizard. Click the Select archive file option, and then navigate to your unzipped cookbook directory and select the zip file named cookbookTalendJobs.zip.
10. Click on Finish to import all the Talend artifacts.
11. If you copied your data to C:\cookbookData, then you can ignore the next steps, and you have completed the installation of the cookbook software.
12. Open the cookbook context, as shown in the following screenshot, and click Next at the first window.
13. Open the Values as a table panel and change the value of cookbookData to your chosen directory, as shown in the following screenshot:
14. Click Finish to complete the installation process.
Enabling tHashInput and tHashOutput
tHashInput : This component is used along with tHashOutput. It reads from the cache memory data loaded by tHashOutput.
tHashOutput : This component writes data to the cache memory and is closely related to tHashInput. Together, these twin components offer high-speed data access to facilitate transactions involving a massive amount of data.
Many of the exercises rely on the use of tHashInput and tHashOutput components. Talend 5.2.3 does not automatically enable these components for use in jobs. To enable these components perform the instructions in the following section:
How to do it…
- On the main menu bar, navigate to File | Edit Project properties to open the properties dialogue.
- Select Designer then Palette Settings.
- Click on the Technical folder and then click on the button shown in the following screenshot to add this folder to the Show panel.
4. Click on OK to exit the project settings.