This segment contains recipes that show some of the techniques used to read and write data to databases. It also contains recipes that show techniques to manage tables within a database.
Many applications and organizations rely on DATABASES to store their corporate data. These can be application databases to support operational systems, data warehouses, data marts, or ODS (Operational Data Store), and any data integration developer must understand how to manipulate the database objects and the data held within the database.
Talend usually connects to a database using JDBC, so it can connect to any data source for which there is a JDBC driver, which means that Talend can connect to all of the most popular databases and a host of fewer well-known ones too.
In order to perform the recipes, you need to ensure that you have a database installed.
You will also need a general working knowledge of databases and have a reasonable understanding of SQL.
It’s worth noting that there are many flavors of databases and while all act generally in the same manner, there are nuances and extensions that may exist in say, Oracle, but not in MySQL.
If you would like to Enrich your career with a Talend certified professional, then visit Mindmajix - A Global online training platform: “Talend Certification Training”. This course will help you to achieve excellence in this domain. |
If you often need to connect to database tables of any kind, then you may want to centralize the connection information details in the Metadata folder in the Repository tree view.
This recipe shows how a database connection can be easily created using the Talend supplied wizard. Note that this recipe and the next use the connection cookbookDB _myCopy. This version of the database metadata is used for demonstration only.
Getting ready
Ensure that MySQL is running and you have executed the command.
How to achieve it…
The steps to be performed are as follows:
Note
Now that you have the required parameters for connection, you will need to update the context information in the databases cookbookDB and cookbookDBCompleted to ensure that the recipes in this section will complete successfully. You can do this by amending the details in the contexts for both connections.
Once you have changed the context variables. Go back to the metadata and check connections for both.
How it works…
This wizard allows you to easily build up the JDBC connection string for your MySQL database and check that you can connect to it.
The database information is then exported to a context of the same name.
There’s more…
Database connections are very important within Talend and can be a great time-saver.
Related Article: Talend Interview Questions and Answers for Experienced |
If you drag the connection to an open job, you will be presented with a list of options to create components that are pre-populated with the job information.
If, however, you decide to use a component from the palette, then the database information can be populated in the component by dragging the connection and dropping it on the component. This should work for most of the components, but for those where it doesn’t, for example, tCreateTable, you need to open the component, change the Property Type to the repository, and then select the connection.
While you can work with all the database components without using a metadata connection, it is highly recommended that you set one up for every database you use. The shortest time taken to do this is easily recovered later on.
As with all Talend artifacts, careful naming can make life easier in the long term, so do check to see if you have naming standards regarding connections already defined.
If not, then it is recommended that the database connections be named exactly the same as the database/schema (including case) for easy identification. If you are using a schema within the database then include both, separated by an underscore, for example, FINANCE_SALES.
When developing jobs in Talend, it’s sometimes necessary to run them in different environments. For other business cases, you need to pass values between multiple sub-jobs in a project. To solve this kind of issue, Talend introduced the notion of “contexts”.
In the above recipe, you will notice that immediately after checking the connection, we exported the settings to a context. This is a good practice and worth doing immediately after the connection has been tested and is working since it will allow the database settings to be defined as context parameters; a must when the code is promoted between environments.
Once we have the connection set up, it is very easy to grab table metadata from the database for use in our jobs. The following example shows how simple this is.
Getting ready
If you have set up the database correctly, then you should be able to see three tables in the database; customer, order, and order_item.
How to accomplish it…
The next step is to import the database table schema for the desired table.
Select “Retrieve Schema” from the context menu of your database connection. You can skip any filter restrictions and proceed with “Next” to the selection of the table schema of the “” database table.
After selecting the schema for the table it can be imported with the “Finish” button and it will be available as a data input source for the job in the “Table schemas” of your “Db Connections”.
You should check and maybe adapt the column data types in the schema definition (especially for the data types of kind date or datetime) by means of editing the schema again (“Edit Schema” from the context menu).
Verify if the date parser format for your values from the database are correctly or change the „Date Pattern“ appropriately:
In our example, we have to change the „Date Pattern“ from the default value of “dd-MM-yyyy” to the data representation in our database table (as seen above): “yyyy-MM-dd”. The time part for DateTime values will be passed directly and recognized without any special definition.
This screenshot shows the correctly adapted first Date Pattern for the column created on. The second column has not been changed yet but also needs to be changed to the same value: “yyyy-MM-dd”.
Finally, save the changed Schema settings.
How it works…
Talend uses the connection information to gather schema and table information from the database for presenting to the wizard. When we select the tables, Talend uses the information to create Talend schemas for use in our jobs.
There’s more…
As with database connections, schemas, when dragged onto the canvas, will provide us a host of options as to how we wish to use the schema; for example, as an input table or output table.
Tip
It is quicker and easier to work with schemas via the metadata panel than to try to hand-crank them from the palette, so do get into the habit of using the metadata panel for data sources and targets.
Related Article: TALEND Tutorial |
It is possible to use parameters passed or created in a job to drive the results of a SQL query. This short recipe shows how.
Getting ready
Open the job jo_cook_ch07_0020_contextInQuery, which is a copy of the job from the previous recipe.
How to achieve it…
The steps to be performed are as follows:
HAVING SUM(o.`orderTotal`) > "+context.minOrderValue.
How it works…
The SQL statement used by tMysqlInput is held in a string, so it can be manipulated just as any other string in Java. Thus, we can alter the value of the string using normal Java conventions; in this case, a concatenate (+).
The statement is thus translated at runtime and the value of the context is substituted into the SQL query, which returns the customized result set.
There’s more…
Following are some additional points to be noted regarding the use of parameterized queries.
The globalMap variables
globalMap variables can be used in exactly the same way as the context variable.
Related Article: Context Variables in Talend |
Developing the query
In most cases, it is better to test a query using fixed values in the database client and then substitute these for variables once the query has been proven.
Following are some other notes on developing queries with parameters:
Explore TALEND Sample Resumes! Download & Edit, Get Noticed by Top Employers! |
Reloading at each row
This technique is absolutely essential for the reload at each row processing used in real-time scenarios that are described in detail in MAPPING DATA.
Name | Dates | |
---|---|---|
Talend Training | Sep 17 to Oct 02 | View Details |
Talend Training | Sep 21 to Oct 06 | View Details |
Talend Training | Sep 24 to Oct 09 | View Details |
Talend Training | Sep 28 to Oct 13 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.