Teradata is one of the most popular Relational Database Management Systems, which is built on massively parallel services to develop large-scale data warehouse applications.
It supports multiple server platforms like Unix/Linux/Windows and is capable of providing support to a number of client environments at a time.
It is basically developed by Teradata Corporation, which is a provider of database and analytics related products and services.
When your company is running high processing, large volume data warehouse, Teradata is the best option. It is designed to handle massive amounts of data efficiently by processing them parallelly to the tiniest transactions within.
It performs powerful OLAP(Online Analytical Programming) functions to perform complex analytical functions on data, along with a limitless report review and data discovery.
Incorporate Teradata into your systems, if you are looking for high-performance and linear scalability of the database, which is not very likely with Oracle or other DBMS data structures.
There are a number of distinguishing features of Teradata that add value to the Teradata RDM system. These features are listed below:
1) Linear Scalability
Teradata offers high, linear scalability which allows a large volume of data to be handled efficiently at a time, by adding nodes for increased data chunks. It can be scaled up to as high as 2048 nodes, which enhances the performance of the system greatly.
2) Unlimited Parallelism
The architecture of Teradata is based on massively parallel processors(MPP), which divides large volumes of data into smaller processes to process them parallelly. This ensures the speedy execution of complex tasks.
3) Mature Optimizer
Teradata Optimizer is one of the most advanced optimizers available, which can handle up to 64 joins in a single query.
4) Shared Nothing Architecture of Teradata
The Teradata nodes, Access Module Processors (AMPs) and disks work independently. These resources are not shared, hence providing the best value for a given task. The 100% availability of every resource enhances efficiency by decreasing the process execution time.
The parallel system of Teradata can connect to channel-attached systems like network-attached systems or mainframes.
6) Low TCO
The Teradata system offers a low total cost of ownership, as it is easy to set up maintain and administer the arrangement.
Teradata systems allow implementation of SQL to interact with the data in the database easily. It also provides its own extension.
8) Load & Unload utilities
Teradata provides load & unload utilities to move data in and out of the Teradata system
9) Automatic Distribution
Teradata can easily distribute the data to its disks without any external intervention.
10) Robust Utilities
There are many robust utilities provided by Teradata to handle the data in and out of the Teradata systems. FastLoad, MultiLoad, FastExport, and TPT are few of them which helps in import and export of data to and from the systems.
Teradata is a very efficient, inexpensive and high-quality Relational Database management System which handles large volumes of data with ease.
Teradata is based on Massively Parallel Processing(MPP) architecture. It is made of Parsing Engine(PE), BYNET, Access Module Processors(AMPs) and other components like nodes.
Below are the components of the Teradata Architecture.
The architecture of Teradata is essentially Massively Parallel. As we have discussed above, the Teradata system has 4 components. They are being the Parse Engine (PE), BYNET, AMPs and the Disks.
Based on the principal function of the Teradata system, the architecture can be recognized in two forms. They are:
In both forms of architecture, the components mentioned above remain the same.
Teradata allows SQL queries to perform required functions on the Teradata database. The basic structure of SQL queries along with some functions and other details are discussed below.
SQL divides the tables into two types. One is being SET and other being MULTISET. SET tables do not store duplicate records, while MULTISET allows duplicate records.
Below are few commands that are used on tables in Teradata
The section below includes the details required to manipulate data stored in the database.
The manipulation of data comprises of inserting new records, updating and deleting the existing ones.
Details of each of them are given below.
1) Insert Records
There are two situations of "insert record". They are:
In this scenario, you directly introduce a record with fresh values into a table. The command used to insert records into a table is,
INSERT INTO <tablename> (column1, column2, column3,…) VALUES (value1, value2, value3 …);
Let us see an example to see how to insert records into a table called "Employee"
Command INSERT INTO Employee ( EmployeeNo, FirstName, LastName, BirthDate, JoinedDate, DepartmentNo ) VALUES ( 101, 'Mike', 'James', '1980-01-05', '2005-03-27', 01 );
This inserts one record into Employee with the values for each column mentioned.
You can use the SELECT statement to view the inserted record, details of which are seen in the next section.
Sometimes you may want to insert data from another table into a new table. The command used to insert a record from another table is,
INSERT INTO <tablename> (column1, column2, column3,…) SELECT column1, column2, column3… FROM <source table>;
Let us see an example where you insert a record into a table called Employee_Bkup from Employee table.
INSERT INTO Employee_Bkup ( EmployeeNo, FirstName, LastName, BirthDate, JoinedDate, DepartmentNo ) SELECT EmployeeNo, FirstName, LastName, BirthDate, JoinedDate, DepartmentNo FROM Employee;
This command inserts all the records from the Employee table into the new table.
There are certain logical constraints that are to be taken care of while executing INSERT command. They are:
2) Update Records
Update Record is used to update the existing values of the columns in the tables.
UPDATE <tablename> SET <columnnamme> = <new value> [WHERE condition];
Let us see an example of UPDATE command. Below example shows updating a record of the table Employee.
UPDATE Employee SET DepartmentNo = 03 WHERE EmployeeNo = 101;
The record with EmployeeNo 101 will be updated with the value for column DepartmentNo .
Note: You can update the values of a record with values from another Table.
If the WHERE clause is not specified, all the records will be updated with the new column value.
3) Delete Records
This command is used to delete one or more records from the table.
DELETE FROM <tablename> [WHERE condition];
The below example explains the syntax of Delete command with an example. The WHERE clause specifies the details of the column to be deleted.
DELETE FROM Employee WHERE EmployeeNo = 101;
Note: If the WHERE clause is not mentioned, all records of the table will be deleted.
The SELECT statement is used to view records of a table.
SELECT column 1, column 2, ..... FROM tablename;
The below example explains to you how the SELECT statement works.
Use the command like the one below, to get specific details of a table.
SELECT EmployeeNo,FirstName,LastName FROM Employee
This gives an output of all the values of the columns EmployeeNo, FirstName, LastName from the table Employee.
If you want to get all the records from a table, use the command below.
SELECT * FROM Employee;
The WHERE clause is a filter statement. It is used to fetch filtered records that satisfy some specific condition. This condition is associated with the WHERE keyword in its syntax.
It is generally used with the SELECT statement.
SELECT * FROM tablename WHERE[condition];
The example below uses the WHERE clause to get the record associated with the employee with EmployeeNo 101
SELECT * FROM Employee WHERE EmployeeNo = 101;
This fetches the records whose EmployeeNo = 101.
This command is used to order the data in tables in a specific manner. Whenever you retrieve records from the SELECT statement, the records are not fetched in a particular order.
You can then employ this command on the results of SELECT to arrange the retrieved records in ASC/DESC order.
SELECT * FROM tablename ORDER BY column 1, column 2..;
The example below shows the ORDER BY command used on the column Firstname to arrange the records according to it.
SELECT * FROM Employee ORDER BY FirstName;
This gives an output where the alphabetical order of Firstname is considered for sorting.
As the name suggests, this is used to group similar records into one group.
It is often used with SELECT statement.
SELECT column 1, column2 …. FROM tablename GROUP BY column 1, column 2..;
This command is used in many functional operations while executing queries, to have an organized set of data.
Set operators combine results of multiple SELECT statements. Unlike the Join functions, Set operators mainly join multiple rows, while Join functions join columns from multiple tables.
There are 4 set operators in Teradata. They are:
UNION command combines rows from multiple select statements, ignoring the duplicates.
SELECT col1, col2, col3… FROM <table 1> [WHERE condition] UNION SELECT col1, col2, col3… FROM <table 2> [WHERE condition];
Here the contents of col1 from table1 will be combined with contents of col2 from table2 and similarly, corresponding column contents will be combined, ignoring the duplicates.
UNION ALL combines the contents from multiple tables just like UNION but including the duplicates.
SELECT col1, col2, col3… FROM <table 1> [WHERE condition] UNION ALL SELECT col1, col2, col3… FROM <table 2> [WHERE condition];
Here the contents of col1 from table1 will be combined with contents of col2 from table2 and similarly, corresponding column contents will be combined, including the duplicates.
INTERSECT function is similar to the function of Intersects in Sets of general mathematics. It outputs the intersection of the results of 2 SELECT statements where the rows with common data (or the rows that exist in both the results in corresponding columns) are displayed.
SELECT col1, col2, col3… FROM <table 1> [WHERE condition] INTERSECT SELECT col1, col2, col3… FROM <table 2> [WHERE condition];
Here the output will be the common rows of col1, common rows of col2 and common rows of col3 intersecting from both the tables.
MINUS/EXCEPT command executes to give the rows that are present in the first SELECT but not present in the second SELECT.
Both of the MINUS and EXCEPT operators work the same way.
Syntax remains the same to both the functions, which is shown below.
SELECT col1, col2, col3… FROM <table 1> [WHERE condition] MINUS SELECT col1, col2, col3… FROM <table 2> [WHERE condition];
Here the output consists of the rows of col1 from table1 which are absent in col1 from table2 and rows of col2 from table1 which are absent in col2 from table2 and so on.
DATE and TIME functions
This section is dedicated to the date and time functions offered by Teradata.
In Teradata, dates are stored as integers, in the form,
((YEAR - 1900) * 10000) + (MONTH * 100) + DAY
You can view how the Date is stored in Teradata using:
SELECT CAST(CURRENT_DATE AS INTEGER);
Since the Date is internally stored as integers, arithmetic functions can be employed on it.
There are general functions on Date and Time which are regularly used in Teradata. These are discussed below.
EXTRACT function, as the name suggests is used to extract the required portions from Date-Time-stamp. For example, it can be used to fetch the day, month and year separately from the Datestamp while it can also be used to fetch the time in the hour, mins, and seconds from the Timestamp.
The below example explains how to use the EXTRACT function to obtain date and time from the Datestamp.
SELECT EXTRACT(YEAR FROM CURRENT_DATE); EXTRACT(YEAR FROM Date) ----------------------- 2016 SELECT EXTRACT(MONTH FROM CURRENT_DATE); EXTRACT(MONTH FROM Date) ------------------------ 1 SELECT EXTRACT(DAY FROM CURRENT_DATE); EXTRACT(DAY FROM Date) ------------------------ 1 SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP); EXTRACT(HOUR FROM Current TimeStamp(6)) --------------------------------------- 4 SELECT EXTRACT(MINUTE FROM CURRENT_TIMESTAMP); EXTRACT(MINUTE FROM Current TimeStamp(6)) ----------------------------------------- 54 SELECT EXTRACT(SECOND FROM CURRENT_TIMESTAMP); EXTRACT(SECOND FROM Current TimeStamp(6)) ----------------------------------------- 27.140000
Teradata provides INTERVAL function to use arithmetic operations on Date and Time. There are two types of Interval functions available.
a) Year-Month Interval YEAR YEAR TO MONTH MONTH b) Day-Time Interval DAY DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR HOUR TO MINUTE HOUR TO SECOND MINUTE MINUTE TO SECOND SECOND
Let us see an example to see how we can use "+" operator on year to get an incremented year.
SELECT CURRENT_DATE, CURRENT_DATE + INTERVAL '03' YEAR; Date (Date+ 3) -------- --------- 19/01/01 22/01/01
Teradata allows the basic aggregate functions on the tables. They are:
These functions can be used on the SELECT statement.
SELECT fun(columnName) from tableName;
Here fun refers to the aggregate function name, columnName specifies the name of the target column and the tableName specifies the target table.
There are built-in functions in Teradata, which are basically the extensions of the general SQL functions. They are
Joins are used to combine various records from different tables. There are a number of joins available in Teradata. Let us discuss few important joins which are quite commonly used.
Inner Join combines records from various tables to return values that are common to both the tables. Or in other words, it returns the values that exist in both the tables.
SELECT col1, col2, col3…. FROM Table-1 INNER JOIN Table-2 ON (col1 = col2) <WHERE condition>;
This function returns the values based on the type of outer join specified. There are 3 variations on outer join available. They are:
The following Syntax is for the left outer join.
SELECT col1, col2, col3…. FROM Table-1 LEFT OUTER JOIN Table-2ON (col1 = col2) <WHERE condition>;
You can replace LEFT OUTER JOIN by RIGHT OUTER JOIN or FULL OUTER JOIN for the respective output.
Cross Join combines every row from the left table with every row from the right table.
SELECT A.col1, B.col2 FROM Employee A CROSS JOIN Salary B
Here the col1 of table A is combined by every row with the col2 of table B.
A subquery returns records from one table based on the results from another table. It is basically a SELECT query within another query. There will be 2 SELECT queries; an inner query and an outer query. The results from the inner query are used to execute the outer query.
Here are some of the salient features of subqueries:
SELECT col1, col2, col3,… FROM O0uter Table WHERE col1 OPERATOR ( Inner SELECT Query);
Here the WHERE clause will have the condition that needs to be satisfied by the records, which are to be returned from the outer table.
In this section, we are going to learn about:
A primary index in a table is used to define where the data resides. Every table in Teradata is required to have a primary index. Generally, the user can assign the primary index while creating a table, however, if it is not assigned, Teradata automatically assigns one.
Primary Index helps in fast access of data from the table. It can have a maximum of 64 columns.
There are 2 types of Primary Index. They are:
Unique Primary Index(UPI)
For a table defined to be having UPI, the column deemed as UPI should not have any duplicate values. If you try to insert one, it will be rejected.
CREATE SET TABLE Salary ( EmployeeNo INTEGER, Gross INTEGER, Deduction INTEGER, NetPay INTEGER) UNIQUE PRIMARY INDEX(EmployeeNo);
Non-Unique Primary Index(NUPI)
For a table defined to be having NUPI, the column deemed as NUPI can accept duplicate values.
CREATE SET TABLE Employee _Accounts ( EmployeeNo INTEGER, employee_bank_account_type BYTEINT. employee_bank_account_number INTEGER, employee_bank_name VARCHAR(30), employee_bank_city VARCHAR(30) ) PRIMARY INDEX(EmployeeNo);
A table can have only one Primary Index defined for data access. But in many cases of the real-time environment, more than one column will be frequently accessed for data. This multiple access can be reduced by using the concept of Secondary Index.
Secondary Index provides an alternate way to access data.
Here are a few distinguishing features of Secondary Index, that differentiates it from Primary Index.
There are two types of Secondary Indexes, just like Primary Index.
Unique Secondary Index (USI)
A Unique Secondary Index will not allow for duplicate values. To access the row in USI, you need to follow a two-AMP process.
To create USI
CREATE UNIQUE INDEX(EmployeeNo) on Employee;
Where Employee is the table name and EmployeeNo is the column name.
Non-Unique Secondary Index (NUSI)
A Non-Unique Secondary Index (NUSI) allows for duplicate entries. To access data from a NUSI, you need to follow an all-AMP process.
To create NUSI
CREATE INDEX(FirstName) on Employee;
Where Employee is the table name and FirstName is the column name.
Join Index helps in providing better performance by giving an alternate access path for data.
Below are some of the features of Join Index
Below are a few disadvantages of Join Index, owing to its features.
There are 3 types of Join Indexes. They are:
Single Table Join Index (STJI)
Single Table Join Index allows partitioning a big table into smaller ones based on the different primary index columns, than the ones in the base table.
To create STJI
CREATE JOIN INDEX <index name> AS <SELECT Query> <Index Definition>;
Multi-Table Join Index (MTJI)
As the name says, MTJI is formed by joining multiple tables which are frequently joined for obtaining the required result set. It basically improves performance when the tables are joined frequently.
Aggregated Join Index (AJI)
Aggregated Join Index is best suited on tables where a set of columns are frequently aggregated. It improves the performance of aggregation tables. However, it supports only the SUM and the COUNT operation.
Macro is a facility of Teradata that stores the frequently used SQL statements, hence eliminating the need to re-write SQL queries every time.
Macro executes every time with the same SQL statements, maybe with different parameters. This will be because the probable execution plan might be stored in Request-To-Step' cache(Plan library). This cache reduces the parsing time and improves the overall performance.
Characteristics of Macros
To create Macro
To create Macros use the CREATE MACRO statement
CREATE MACRO <macroname> [(parameter1, parameter2,...)] (<sql statements>); To execute Macros
To execute Macros, use EXEC command
OLAP functions are similar to the aggregate functions in operation, however, they return individual rows along with the aggregates, unlike aggregate functions which only return one value.
<aggregate function> OVER ([PARTITION BY] [ORDER BY columnname][ROWS BETWEEN UNBOUDED PRECEDING AND UNBOUNDED FOLLOWING)
Here aggregate function can be SUM, COUNT, MAX,MIN, AVG.
The RANK function is used to order the records based on the column provided. The RANK function can also be used to filter the records based on the rank.
RANK() OVER ([PARTITION BY columnnlist] [ORDER BY columnlist][DESC|ASC])
When the data instruction is passed from the user to the Teradata architecture, the Parsing Engine (PE) parses the instruction to the message passing layer. This layer is responsible for assigning rows to the AMPs for further processing. The process of assigning rows to each AMP involves a technique called Hashing.
In general, a row is assigned to a particular AMP according to the primary index value. The hashing algorithm is employed here to decide which AMP gets the row.
Steps involved in data Insertion
Statistics by far is the most important mechanism to understand and utilize an environment to its best capacity. This is commonly used in all areas to predict the best solution for a given condition.
In Teradata, the optimizer uses the statistics collected from the tables within an AMP, to understand and formulate the execution strategy.
There are many commands used to obtain the statistics of tables.
In order to come up with an optimal execution strategy, environment information and data demographics are essentially required.
There are three ways possible, to collect data from tables in an AMP.
To collect statistics
To collect statistics, COLLECT STATISTICS command is used.
COLLECT [SUMMARY] STATISTICS INDEX (indexname) COLUMN (columnname) ON <tablename>;
To view the collected statistics on a table, use the following syntax
HELP STATISTICS <tablename>;
There are many differences between MPP and SMP type of processors architecture. Some of the differences are listed below.
Teradata has a range of products covered to meet the demands of Data warehousing and ETL needs of any organization. Teradata products which are important to be noted are mentioned below:
Product suite name
Used in Teradata analytics platform
Analytics Platform Analytics on Hadoop (Aster)Analytics Portfolio (Aster)
High-impact hybrid cloud solutions for any business enhancement
Used to simplify big data streaming
Data management tools for data protection
System analysis tools for the big data to monitor DBA
Eco system Management
The tools for ecosystem help to monitor and manage Teradata systems
The tools for workload management help you to organize and manage the ever growing needs of the business.
Active System Manager (TASM)Workload Management
SQL Query Engine
Very powerful SQL Query engines for Hadoop and others.
Presto (Free Download)
Load & Unload Utilities
The tools that assist the easy loading and unloading of the data which help in parallel extract and activities.
Parallel Transporter (TPT)
UDA enabling software
This allows processing across all workload engines.
Teradata is one of the best RDBMS products available, which runs on the concepts of Multiple Parallel Processing and Share Nothing Objective.
With an increase in the market for the big data analysis and cloud management, a lot of effort and focus has been spread on the ways to manage a big data warehouse in a company or industry. Many of these giants are moving towards cloud solutions to enjoy reduced operational costs and avoid vendor-lock. However, a common challenge that is incurred while migrating to cloud is the rewriting of existing SQL scripts written in the Teradata's own version on ANSI-SQL. These scripts are heavy with business logic and typically take a longer time to be re-written.
A lot of SQL Analyzers are available in the market which does this job for you.
Teradata SQL Analyzer is one such utility framework which helps in analyzing the SQL scripts and inter-table dependencies to finally convert them to ANSI-SQL. You can install it into your system by downloading the jar file from https://s3.amazonaws.com/mingpublic/SQLP.jar.
There are various applications of Teradata in the real-time. Below are the most important applications described.
There are a lot of jobs related to Teradata for experienced or learned individuals. Some of the related job roles are:
The average salary of a Teradata Developer in the US is $108633 per annum, as of 2019
Big Data and BI is the assured future of the IT industry. The problems of analyzing and handling the large volumes of data are one of the real world problems faced by companies. There are different tools and solutions that separately address the structured and unstructured data.
Teradata is potentially equipped to approach all these problems as a unanimous platform. It is a holistic program to handle both structured and unstructured data.
Teradata is also one of the largest scalable platforms that is available in the market. Hence, many companies are adopting Teradata over other platforms.
The expanse of Teradata in the future seems large and promising.
As a part of the ever-expanding world, data is huge and ever increasing. Every bit of data is valued whether in technical or non-technical industry. Many companies are involved in collecting and analyzing data to watch and predict future trends in every field. Teradata is one such robust platform developed, which encourages large volumes of data and provides the important tools and functions to operate on the same.
Free Demo for Corporate & Online Trainings.