Teradata Tutorial

In this Teradata tutorial, we will start from the basics of Teradata and learn all the major concepts of Teradata.

Teradata Tutorial For Beginners

Teradata Tutorial - Table of Contents
  1. What is Teradata
  2. Why use Teradata
  3. Features of Teradata
  4. Components of Teradata
  5. Teradata Architecture
  6. What is Teradata SQL
  7. MPP VS SMP
  8. Teradata warehouse product suite
  9. Applications of Teradata
  10. Job Roles and Salaries
  11. Future Scope

What is Teradata

Teradata is one of the most popular Relational Database Management Systems, which is built on massively parallel services to develop large-scale data warehouse applications.

It supports multiple server platforms like Unix/Linux/Windows and is capable of providing support to a number of client environments at a time.

It is basically developed by Teradata Corporation, which is a provider of database and analytics-related products and services.

If you would like to become a Teradata DBA certified professional, then Enrol Our  "TeraData Training". This course will help you to achieve excellence in this domain.

Why go for Teradata

When your company is running high processing, large volume data warehouse, Teradata is the best option. It is designed to handle massive amounts of data efficiently by processing them parallelly to the tiniest transactions within.

It performs powerful OLAP(Online Analytical Programming) functions to perform complex analytical functions on data, along with a limitless report review and data discovery.

Incorporate Teradata into your systems, if you are looking for high-performance and linear scalability of the database, which is not very likely with Oracle or other DBMS data structures

Features of Teradata

There are a number of distinguishing features of Teradata that add value to the Teradata RDM system. These features are listed below:

1) Linear Scalability

Teradata offers high, linear scalability which allows a large volume of data to be handled efficiently at a time, by adding nodes for increased data chunks. It can be scaled up to as high as 2048 nodes, which enhances the performance of the system greatly.

2) Unlimited Parallelism

The architecture of Teradata is based on massively parallel processors(MPP), which divides large volumes of data into smaller processes to process them parallelly. This ensures the speedy execution of complex tasks.

3) Mature Optimizer

Teradata Optimizer is one of the most advanced optimizers available, which can handle up to 64 joins in a single query.

4) Shared Nothing Architecture of Teradata

The Teradata nodes, Access Module Processors (AMPs), and disks work independently. These resources are not shared, hence providing the best value for a given task. The 100% availability of every resource enhances efficiency by decreasing the process execution time.

5) Connectivity

The parallel system of Teradata can connect to channel-attached systems like network-attached systems or mainframes.

6) Low TCO

The Teradata system offers a low total cost of ownership, as it is easy to set up maintain and administer the arrangement.

7) SQL

Teradata systems allow the implementation of SQL to interact with the data in the database easily. It also provides its own extension.

8) Load & Unload utilities

Teradata provides load & unload utilities to move data in and out of the Teradata system

9) Automatic Distribution

Teradata can easily distribute the data to its disks without any external intervention.

10) Robust Utilities

There are many robust utilities provided by Teradata to handle the data in and out of the Teradata systems. FastLoad, MultiLoad, FastExport, and TPT are a few of them which help in the import and export of data to and from the systems.

 MindMajix YouTube Channel

Components of Teradata

Teradata is a very efficient, inexpensive, and high-quality Relational Database management System that handles large volumes of data with ease.

Teradata is based on Massively Parallel Processing(MPP) architecture. It is made of Parsing Engine(PE), BYNET, Access Module Processors(AMPs), and other components like nodes.

Below are the components of the Teradata Architecture.

  • Parsing Engine: Parsing Engine is the base component that receives the queries from the clients and prepares the execution plan. Below are the responsibilities of PE:
    • It receives the SQL query from the client.
    • Checks for the syntax errors by parsing the SQL query.
    • Checks if the objects used in the query are relevant & existing.
    • Checks if the user is authenticated against the objects used in the query.
    • Prepares an efficient execution plan, considering the query, and sends it to BYNET.
    • Gets the result from the AMPs and sends it back to the client.
  • BYNET: This is the message-passing layer or simply the networking layer in Teradata. It receives the execution plan from the parsing engine and passes it to AMPs and the nodes. In turn, it receives the processed output from the AMPs and sends it back to the parse engine.
    In order to maintain adequate availability, there are 2 BYNETs available. BYNET 0 and BYNET 1. This ensures that a secondary BYNET is available in case of the failure of the primary BYNET.
  • Access Module Processors (AMPs): These are the virtual processors of Teradata. They receive the execution plan and the data from the parse engine. It is here that the data will undergo any required conversion, filtering, aggregation, sorting, etc., and will be further sent to the corresponding disks for storage. Table records will be evenly distributed to each AMP for data storage. Each AMP will have a set of disks assigned to it for data storage. Only that AMP will have the access permission to read/write data into the disks.
  • Nodes: The basic unit of a Teradata system is called a Node. Each node has its own operating system, CPU memory, a copy of RDBMS software, and some disk space. A single cabinet can have one or more nodes in it.

Teradata Architecture

The architecture of Teradata is essentially Massively Parallel. As we have discussed above, the Teradata system has 4 components. They are being the Parse Engine (PE), BYNET, AMPs, and the Disks.

Based on the principal function of the Teradata system, the architecture can be recognized in two forms. They are:

  • Storage Architecture
  • Retrieval Architecture

In both forms of architecture, the components mentioned above remain the same.

  • Storage Architecture: When the client requests for data insertion, the Parse Engine sends the records to BYNET. BYNET receives the records and forwards the required rows to the corresponding AMP for insertion. AMP takes these rows and stores them on its disks accordingly.
  • Retrieval Architecture: When the client sends a retrieval request to the Teradata system, the Parse Engine sends the query to the BYNET. BYNET forwards the request to the appropriate AMPs. AMPs carry out the search in parallel from all their disks. Once the record requested is found and retrieved, it is sent back to BYNET, which is further sent to the client through the Parse Engine.

Teradata SQL

Teradata allows SQL queries to perform required functions on the Teradata database. The basic structure of SQL queries along with some functions and other details are discussed below.

Basic Definitions

  • Database - It is a collection of logically related data stored in tables for easy analysis and access.
  • Tables - These are the basic units of any DBMS set where data is stored. A table is made of rows and columns.
  • Primary Key - It is the unique key that holds a distinct value to identify a specific row. No duplicates and no null values are accepted for this key. It is a mandatory field for any table in RDBMS.
  • Foreign Key - Foreign key in a child table is the primary key in the parent table, which is used to build relations. It accepts both null values and duplicate values. It is not compulsory for a table.

SQL Tables

SQL divides the tables into two types. One is being SET and the other being MULTISET. SET tables do not store duplicate records, while MULTISET allows duplicate records.

Table Commands

Below are few commands that are used on tables in Teradata

  • Create Table: It is used to create new tables in Teradata
  • Alter Table: It is used to add or remove columns from a table.
  • Drop-Table: It is used to drop/ remove a table.

Manipulating Data in Tables

The section below includes the details required to manipulate data stored in the database.

The manipulation of data comprises inserting new records, updating, and deleting the existing ones.
Details of each of them are given below.

1) Insert Records

There are two situations of "insert record". They are:

  • Inserting record directly

In this scenario, you directly introduce a record with fresh values into a table. The command used to insert records into a table is,

Command

INSERT INTO

Syntax

INSERT INTO <tablename> 
(column1, column2, column3,…) 
VALUES 
(value1, value2, value3 …);

Example

Let us see an example to see how to insert records into a table called "Employee"

Command

Command
INSERT INTO Employee (
   EmployeeNo, 
   FirstName, 
   LastName, 
   BirthDate, 
   JoinedDate, 
   DepartmentNo 
)
VALUES ( 
   101, 
   'Mike', 
   'James', 
   '1980-01-05', 
   '2005-03-27', 
   01
);

This inserts one record into Employee with the values for each column mentioned.

You can use the SELECT statement to view the inserted record, details of which are seen in the next section.

  • Insert from another table

Sometimes you may want to insert data from another table into a new table. The command used to insert a record from another table is,

Command

INSERT SELECT

Syntax

INSERT INTO <tablename> 
(column1, column2, column3,…) 
SELECT 
column1, column2, column3… 
FROM 
<source table>;

Example

Let us see an example where you insert a record into a table called Employee_Bkup from the Employee table.

Command

INSERT INTO Employee_Bkup ( 
   EmployeeNo, 
   FirstName, 
   LastName, 
   BirthDate, 
   JoinedDate, 
   DepartmentNo 
) 
SELECT 
   EmployeeNo, 
   FirstName, 
   LastName, 
   BirthDate, 
   JoinedDate,
   DepartmentNo 
FROM  
   Employee;

This command inserts all the records from the Employee table into the new table.

There are certain logical constraints that are to be taken care of while executing the INSERT command. They are:

  • The number of values in the VALUES clause should match the columns in the INSERT INTO clause.
  • The data types of the values must be compatible with the data types of the columns specified in the table.
  • The columns that don't accept NULL values must be inserted with some value.
  • The columns that don't have values specified will be inserted with NULL values.

2) Update Records

Update Record is used to update the existing values of the columns in the tables.

Command

UPDATE

Syntax

UPDATE <tablename> 
SET <columnnamme> = <new value> 
[WHERE condition];

Example

Let us see an example of the UPDATE command. The below example shows updating a record of the table Employee.

Command

UPDATE Employee 
SET DepartmentNo = 03 
WHERE EmployeeNo = 101;

The record with EmployeeNo 101 will be updated with the value for column DepartmentNo.

Note: You can update the values of a record with values from another table.

If the WHERE clause is not specified, all the records will be updated with the new column value.

3) Delete Records

This command is used to delete one or more records from the table.

Command

DELETE FROM

Syntax

DELETE FROM <tablename> 
[WHERE condition];

Example

The below example explains the syntax of the Delete command with an example. The WHERE clause specifies the details of the column to be deleted.

Command

DELETE FROM Employee 
WHERE EmployeeNo = 101;

Note: If the WHERE clause is not mentioned, all records of the table will be deleted.

SELECT Statement

The SELECT statement is used to view records of a table.

Command

SELECT

Syntax

SELECT 
column 1, column 2, ..... 
FROM tablename;

Example

The below example explains to you how the SELECT statement works.

  • To get specific columns of a table

Use the command like the one below, to get specific details of a table.

Command

SELECT EmployeeNo,FirstName,LastName 
FROM Employee

This gives an output of all the values of the columns EmployeeNo, FirstName, LastName from the table Employee.

  • To get all the records of a table

If you want to get all the records from a table, use the command below.

Command

SELECT * FROM Employee;

WHERE CLAUSE

The WHERE clause is a filter statement. It is used to fetch filtered records that satisfy some specific conditions. This condition is associated with the WHERE keyword in its syntax.

It is generally used with the SELECT statement.

Syntax

SELECT * FROM tablename 
WHERE[condition];

Example

The example below uses the WHERE clause to get the record associated with the employee with EmployeeNo 101

SELECT * FROM Employee 
WHERE EmployeeNo = 101;

This fetches the records whose EmployeeNo = 101.

ORDER BY

This command is used to order the data in tables in a specific manner. Whenever you retrieve records from the SELECT statement, the records are not fetched in a particular order.

You can then employ this command on the results of SELECT to arrange the retrieved records in ASC/DESC order.

Syntax

SELECT * FROM tablename 
ORDER BY column 1, column 2..;

Example

The example below shows the ORDER BY command used on the column Firstname to arrange the records according to it.

Command

SELECT * FROM Employee 
ORDER BY FirstName;

This gives an output where the alphabetical order of Firstname is considered for sorting.

GROUP BY

As the name suggests, this is used to group similar records into one group.

It is often used with the SELECT statement.

Command

GROUP BY

Syntax

SELECT column 1, column2 …. FROM tablename 
GROUP BY column 1, column 2..;

This command is used in many functional operations while executing queries, to have an organized set of data.

Check out TeraData Interview Questions and Answers that help you grab high-paying jobs

SET Operators

Set operators combine results of multiple SELECT statements. Unlike the Join functions, Set operators mainly join multiple rows, while Join functions join columns from multiple tables.

Pre-requisites:

  • The number of columns in each SELECT statement must be the same.
  • The data types of the columns must be compatible.
  • Only the final SELECT statement must have ORDER BY.

There are 4 set operators in Teradata. They are:

  • UNION
  • UNION ALL
  • INTERSECT
  • MINUS/EXCEPT

1) UNION

UNION command combines rows from multiple select statements, ignoring the duplicates.

Syntax

SELECT col1, col2, col3… 
FROM 
<table 1> 
[WHERE condition] 
UNION
SELECT 
col1, col2, col3… FROM 
<table 2> 
[WHERE condition];

Here the contents of col1 from table1 will be combined with contents of col2 from table2 and similarly, corresponding column contents will be combined, ignoring the duplicates.

2) UNION ALL

UNION ALL combines the contents from multiple tables just like UNION but including the duplicates.

Syntax

SELECT 
col1, col2, col3… 
FROM 
<table 1> 
[WHERE condition] 
UNION ALL
SELECT 
col1, col2, col3…
FROM 
<table 2> 
[WHERE condition];

Here the contents of col1 from table1 will be combined with contents of col2 from table2 and similarly, corresponding column contents will be combined, including the duplicates.

3) INTERSECT

INTERSECT function is similar to the function of Intersects in Sets of general mathematics. It outputs the intersection of the results of 2 SELECT statements where the rows with common data (or the rows that exist in both the results in corresponding columns) are displayed.

Syntax

SELECT 
col1, col2, col3… 
FROM 
<table 1>
[WHERE condition] 
INTERSECT
SELECT 
col1, col2, col3… 
FROM 
<table 2> 
[WHERE condition];

Here the output will be the common rows of col1, common rows of col2, and common rows of col3 intersecting from both the tables.

4) MINUS/EXCEPT

MINUS/EXCEPT command executes to give the rows that are present in the first SELECT but not present in the second SELECT.

Both of the MINUS and EXCEPT operators work the same way.

Syntax

The syntax remains the same for both functions, which is shown below.

SELECT 
col1, col2, col3… 
FROM 
<table 1> 
[WHERE condition] 
MINUS
SELECT 
col1, col2, col3… 
FROM 
<table 2> 
[WHERE condition];

Here the output consists of the rows of col1 from table1 which are absent in col1 from table2 and rows of col2 from table1 which are absent in col2 from table2 and so on.

DATE and TIME functions

This section is dedicated to the date and time functions offered by Teradata.

Data Storage

In Teradata, dates are stored as integers, in the form,

((YEAR - 1900) * 10000) + (MONTH * 100) + DAY

You can view how the Date is stored in Teradata using:

SELECT CAST(CURRENT_DATE AS INTEGER);

Since the Date is internally stored as integers, arithmetic functions can be employed on it.

There are general functions on Date and Time which are regularly used in Teradata. These are discussed below.

Extract

EXTRACT function, as the name suggests is used to extract the required portions from Date-Time-stamp. For example, it can be used to fetch the day, month, and year separately from the Datestamp while it can also be used to fetch the time in the hour, mins, and seconds from the Timestamp.

Example

The below example explains how to use the EXTRACT function to obtain date and time from the Datestamp.

SELECT EXTRACT(YEAR FROM CURRENT_DATE);  
EXTRACT(YEAR FROM Date) 
----------------------- 
        2016  
SELECT EXTRACT(MONTH FROM CURRENT_DATE);  
EXTRACT(MONTH FROM Date) 
------------------------ 
          1        
SELECT EXTRACT(DAY FROM CURRENT_DATE);  
EXTRACT(DAY FROM Date) 
------------------------ 
          1    
SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP);  
EXTRACT(HOUR FROM Current TimeStamp(6)) 
--------------------------------------- 
                 4      
SELECT EXTRACT(MINUTE FROM CURRENT_TIMESTAMP);  
EXTRACT(MINUTE FROM Current TimeStamp(6)) 
----------------------------------------- 
                 54  
SELECT EXTRACT(SECOND FROM CURRENT_TIMESTAMP);  
EXTRACT(SECOND FROM Current TimeStamp(6)) 
----------------------------------------- 
              27.140000

Interval

Teradata provides an INTERVAL function to use arithmetic operations on Date and Time. There are two types of Interval functions available.

a) Year-Month Interval
    YEAR
    YEAR TO MONTH
    MONTH
b) Day-Time Interval
    DAY
    DAY TO HOUR
    DAY TO MINUTE
    DAY TO SECOND
    HOUR
    HOUR TO MINUTE
    HOUR TO SECOND
    MINUTE
    MINUTE TO SECOND
    SECOND

Example

Let us see an example to see how we can use "+" operator on year to get an incremented year.

SELECT CURRENT_DATE, CURRENT_DATE + INTERVAL '03' YEAR; 
  Date    (Date+ 3) 
--------  --------- 
19/01/01   22/01/01

Aggregate Functions

Teradata allows the basic aggregate functions on the tables. They are:

  • COUNT - It counts the number of rows in the specified column.
  • MAX - returns the max value from the specified column
  • MIN - returns the min value from the specified column
  • SUM - returns the sum total of all the rows from the specified column(s)
  • AVG - returns the average value from the specified column

These functions can be used on the SELECT statement.

Syntax

SELECT fun(columnName) from tableName;

Here fun refers to the aggregate function name, columnName specifies the name of the target column and the tableName specifies the target table.

Built-in Functions

There are built-in functions in Teradata, which are basically the extensions of the general SQL functions. They are

  • SELECT DATE;
  • SELECT CURRENT_DATE;
  • SELECT TIME;
  • SELECT CURRENT_TIME;
  • SELECT CURRENT_TIMESTAMP;
  • SELECT DATABASE;

Joins

Joins are used to combine various records from different tables. There are a number of joins available in Teradata. Let us discuss few important joins which are quite commonly used.

  • Inner Join

Inner Join combines records from various tables to return values that are common to both tables. Or in other words, it returns the values that exist in both tables.

Syntax

SELECT 
col1, col2, col3…. 
FROM 
Table-1 
INNER JOIN 
Table-2 
ON (col1 = col2) 
<WHERE condition>;
  • Outer Join

This function returns the values based on the type of outer join specified. There are 3 variations on outer join available. They are:

  • Left Outer Join
    This returns all the values from the left table and only the matching values from the right table.
  • Right Outer Join
    This returns all the values from the right table and only the matching values from the left table.
  • Full Outer Join
    It returns all the values from both the tables, irrespective of matching or non-matching values.

Syntax

The following Syntax is for the left outer join.

SELECT 
col1, col2, col3….
FROM  
Table-1
LEFT OUTER JOIN
Table-2ON (col1 = col2)
<WHERE condition>;

You can replace LEFT OUTER JOIN by RIGHT OUTER JOIN or FULL OUTER JOIN for the respective output.

  • Cross Join

Cross Join combines every row from the left table with every row from the right table.

Syntax

SELECT A.col1, B.col2 
FROM 
Employee A 
CROSS JOIN 
Salary B

Here the col1 of table A is combined by every row with the col2 of table B.

Subqueries

A subquery returns records from one table based on the results from another table. It is basically a SELECT query within another query. There will be 2 SELECT queries; an inner query and an outer query. The results from the inner query are used to execute the outer query.

Features of Subqueries

Here are some of the salient features of subqueries:

  1. A query can have multiple subqueries. Similarly, a subquery can have another subquery.
  2. Subqueries do not return duplicate records.
  3. If a subquery returns only one value, a simple "=" operator can be used to condition the outer query. While, if there are multiple values returned, you can use IN or NOT IN keywords to condition the outer query.

Syntax

SELECT 
col1, col2, col3,… 
FROM 
O0uter Table 
WHERE col1 OPERATOR ( Inner SELECT Query);

Here the WHERE clause will have the condition that needs to be satisfied by the records, which are to be returned from the outer table.

Indexes

In this section, we are going to learn about:

  1. Primary Index
  2. Secondary Index
  3. Join Index

Primary Index

A primary index in a table is used to define where the data resides. Every table in Teradata is required to have a primary index. Generally, the user can assign the primary index while creating a table, however, if it is not assigned, Teradata automatically assigns one.

Primary Index helps in fast access of data from the table. It can have a maximum of 64 columns.

There are 2 types of Primary Index. They are:

  • Unique Primary Index(UPI)
  • Non-Unique Primary Index(NUPI)

Unique Primary Index(UPI)

For a table defined to be having UPI, the column deemed as UPI should not have any duplicate values. If you try to insert one, it will be rejected.

Syntax

CREATE SET TABLE Salary (  
  EmployeeNo INTEGER,  
  Gross INTEGER,    
  Deduction INTEGER,  
  NetPay INTEGER)
UNIQUE PRIMARY INDEX(EmployeeNo);

Non-Unique Primary Index(NUPI)

For a table defined to be having NUPI, the column deemed as NUPI can accept duplicate values.

Syntax

CREATE SET TABLE Employee _Accounts ( 
   EmployeeNo INTEGER, 
   employee_bank_account_type BYTEINT. 
   employee_bank_account_number INTEGER, 
   employee_bank_name VARCHAR(30), 
   employee_bank_city VARCHAR(30) ) 
PRIMARY INDEX(EmployeeNo);

Secondary Index

A table can have only one Primary Index defined for data access. But in many cases of the real-time environment, more than one column will be frequently accessed for data. This multiple access can be reduced by using the concept of the Secondary Index.

Secondary Index provides an alternate way to access data.

Features

Here are a few distinguishing features of the Secondary Index, that differentiate it from the Primary Index.

  • Secondary Index does not help in the data distribution.
  • Secondary Index builds its own sub-table of data. These tables are formulated in all AMPs.
  • Secondary indexes are not mandatory.
  • They can be either created during the table creation or later.
  • Since they maintain a sub-table, they demand extra memory. They are also heavy on maintenance, as they have to be updated for every new row.

There are two types of Secondary Indexes, just like Primary Index.

  • Unique Secondary Index (USI)
  • Non-Unique Secondary Index (NUSI)

Unique Secondary Index (USI)

A Unique Secondary Index will not allow for duplicate values. To access the row in USI, you need to follow a two-AMP process.

To create USI

Syntax

CREATE UNIQUE INDEX(EmployeeNo) on Employee;

Where Employee is the table name and EmployeeNo is the column name.

Non-Unique Secondary Index (NUSI)

A Non-Unique Secondary Index (NUSI) allows for duplicate entries. To access data from a NUSI, you need to follow an all-AMP process.

To create NUSI

Syntax

CREATE INDEX(FirstName) on Employee;

Where Employee is the table name and FirstName is the column name.

Join Index

Join Index helps in providing better performance by giving an alternate access path for data.

Features

Below are some of the features of the Join Index

  • It is a materialized view.
  • Creates its own data table for functioning.
  • You cannot directly query Join Index. The optimizer will decide the need for the Join Index and use it accordingly, during runtime.
  • The sub-table created by Join Index will be automatically updated by Terada, whenever there is any change in the base table.
  • The Join Index table can have a different primary index than the base table.
  • The column can be specified as a value order.
  • It can be used both in Left and Right outer joins. However, it is prohibited in the Full outer join.
  • The maximum limit of the indexes a table can have is 32(including join/hash & secondary indexes).

Disadvantages

Below are a few disadvantages of the Join Index, owing to its features.

  • It demands more memory because the sub-table is maintained.
  • The automatic update of the sub-table of the Join Index causes an excess overhead in Teradata.
  • Features like MultiLoad, FastLoad, or Restore utilities cannot be employed on the table with Join Index.

Types

There are 3 types of Join Indexes. They are:

  • Single Table Join Index (STJI)
  • Multi-Table Join Index (MTJI)
  • Aggregate Join Index (AJI)

Single Table Join Index (STJI)

Single Table Join Index allows partitioning a big table into smaller ones based on the different primary index columns, than the ones in the base table.

To create STJI

Syntax

CREATE JOIN INDEX 
<index name>
AS
<SELECT Query>
<Index Definition>;

Multi-Table Join Index (MTJI)

As the name says, MTJI is formed by joining multiple tables that are frequently joined for obtaining the required result set. It basically improves performance when the tables are joined frequently.

Aggregate Join Index (AJI)

Aggregate Join Index is best suited on tables where a set of columns are frequently aggregated. It improves the performance of aggregation tables. However, it supports only the SUM and the COUNT operation.

Macros

Macro is a facility of Teradata that stores the frequently used SQL statements, hence eliminating the need to re-write SQL queries every time.

Macro executes every time with the same SQL statements, maybe with different parameters. This will be because the probable execution plan might be stored in Request-To-Step' cache(Plan library). This cache reduces the parsing time and improves the overall performance.

Characteristics of Macros

  • It can be shared across many users and can be secured by providing the execute access to the required users.
  • The Macros can be executed as parameterized or non-parameterized.
  • All the statements in macro are processed as one single transaction. If one fails, others can be roll-backed.
  • Nesting of macros is possible.
  • The normal programming features of conditions (if/else) and iterations (for, while and do-while) are not supported.

To create Macro

To create Macros use the CREATE MACRO statement

Syntax

CREATE MACRO <macroname> [(parameter1, parameter2,...)] (<sql statements>);
To execute Macros

To execute Macros, use the EXEC command

Syntax

EXEC <macroname>;

OLAP Functions

OLAP functions are similar to the aggregate functions in operation, however, they return individual rows along with the aggregates, unlike aggregate functions which only return one value.

Syntax

<aggregate function> OVER  
([PARTITION BY] [ORDER BY columnname][ROWS BETWEEN 
UNBOUDED PRECEDING AND UNBOUNDED FOLLOWING)

Here aggregate function can be SUM, COUNT, MAX,MIN, AVG.

RANK

The RANK function is used to order the records based on the column provided. The RANK function can also be used to filter the records based on rank.

Syntax

RANK() OVER 
([PARTITION BY columnnlist] [ORDER BY columnlist][DESC|ASC])

Hashing

When the data instruction is passed from the user to the Teradata architecture, the Parsing Engine (PE) parses the instruction to the message-passing layer. This layer is responsible for assigning rows to the AMPs for further processing. The process of assigning rows to each AMP involves a technique called Hashing.

In general, a row is assigned to a particular AMP according to the primary index value. The hashing algorithm is employed here to decide which AMP gets the row.

Steps involved in data Insertion

  • The user submits a query.
  • The PE receives the query and passes the PI value of the record to the hashing algorithm.
  • The hashing algorithm performs its operation on the primary index value to generate a 32-bit number called a Row Hash.
  • The high-order bits of the Row Hash (first 16 bits) represent the hash map entry. The hash map thus contains one AMP #.
  • BYNET sends the data to the required AMP.
  • AMP uses the 32-bit Row hash to find the row within the disk.
  • If the disk already has a similar row hash, then the unique ID associated with the row is incremented by one(it is also a 32-bit number).
    For a new row hash, the unique ID is assigned as 1 and is incremented whenever a row with the existing row hash is inserted.
  • The row ID is the combination of the unique ID and the Row hash.
  • Each record in a disk is associated with a Row ID.
  • A table is by default sorted logically according to the Row ID.

Statistics

Statistics by far is the most important mechanism to understand and utilize an environment to its best capacity. This is commonly used in all areas to predict the best solution for a given condition.

In Teradata, the optimizer uses the statistics collected from the tables within an AMP, to understand and formulate the execution strategy.

There are many commands used to obtain the statistics of tables.

In order to come up with an optimal execution strategy, environment information and data demographics are essentially required.

Environment Information

This includes,

  • Number of Nodes, AMPs, and CPUs
  • Amount of memory

Data Demographics

This includes,

  • Number of rows
  • Row size
  • The range of values in the table
  • Number of rows per value
  • Number of Nulls

There are three ways possible, to collect data from tables in an AMP.

  • Random AMP Sampling
  • Full statistics collection
  • Using the SAMPLE option

To collect statistics

To collect statistics, the COLLECT STATISTICS command is used.

Syntax

COLLECT [SUMMARY] STATISTICS   
INDEX (indexname) COLUMN (columnname) 
ON <tablename>;

Viewing Statistics

To view the collected statistics on a table, use the following syntax

Syntax

HELP STATISTICS <tablename>; 

Miscellaneous Information

MPP vs SMP

There are many differences between MPP and SMP types of processor architecture. Some of the differences are listed below.

MPP

  • It is a Massively Parallel Processing architecture. It is a system where a number of processors or independent arithmetic units run in parallel.
  • Databases can be expanded when needed, by adding new CPUs or nodes.
  • The MPP architecture implements a sharing nothing policy, where none of the resources are shared amongst the tasks/processes.
  • The performance in MPP occurs linearly. However, the linearity will increase with an increase in the number of nodes.

SMP

  • Symmetric Multi-processing is architecture when a single CPU is shared with the same memory for running multiple processes.
  • This increases the CPU overhead and can also slow down the processing due to memory occupancy by a parallel task.
  • SMP databases rely on the CPU for database searches.
  • In SMP resources are shared for the parallel process, where a single CPU time and memory is shared across the processes.
    SMP databases run on multiple servers.

Teradata Warehouse Product Suite

Teradata has a range of products covered to meet the demands of Data warehousing and ETL needs of any organization. Teradata products that are important to be noted are mentioned below:

Product suite nameUsageTool links
Analytics

Used in Teradata analytics platform

Analytics Platform Analytics on Hadoop (Aster)Analytics Portfolio (Aster)
Cloud

High-impact hybrid cloud solutions for any business enhancement

Hybrid cloud, Managed (Hadoop), IntelliCloud
Data IngestionUsed to simplify big data streamingListener, Data Mover
Data Management

Data management tools for data protection

Backup and RestoreData Mover
DatabaseSystem analysis tools for the big data to monitor DBADatabase (Teradata)
Ecosystem Management

The tools for ecosystem help to monitor and manage Teradata systems

Ecosystem Manager Unity
Workload ManagementThe tools for workload management help you to organize and manage the ever-growing needs of the business.Active System Manager (TASM)Workload Management
SQL Query EngineVery powerful SQL Query engines for Hadoop and others.Presto (Free Download)
Load & Unload Utilities

The tools that assist the easy loading and unloading of the data help in parallel extract and activities.

Parallel Transporter (TPT)
UDA enabling softwareThis allows processing across all workload engines.QueryGrid  Listener  Teradata AppCenter

Teradata SQL Analyzer - Teradata Query Analysis tool

Teradata is one of the best RDBMS products available, which runs on the concepts of Multiple Parallel Processing and Share Nothing Objective.

With an increase in the market for big data analysis and cloud management, a lot of effort and focus has been spread on the ways to manage a big data warehouse in a company or industry. Many of these giants are moving towards cloud solutions to enjoy reduced operational costs and avoid vendor lock. However, a common challenge that is incurred while migrating to the cloud is the rewriting of existing SQL scripts written in Teradata's own version on ANSI-SQL. These scripts are heavy with business logic and typically take a longer time to be re-written.

A lot of SQL Analyzers are available in the market which does this job for you.

Teradata SQL Analyzer is one such utility framework that helps in analyzing the SQL scripts and inter-table dependencies to finally convert them to ANSI-SQL. You can install it into your system by downloading the jar file from https://s3.amazonaws.com/mingpublic/SQLP.jar.

Applications of Teradata

There are various applications of Teradata in real-time. Below are the most important applications described.

  • Customer Data Management: Teradata, while providing its efficient data management techniques, provides the customer with a lasting experience, which in turn strengthens the relationship of the customers with the proprietor.
  • Master data environment: It helps in developing and maintaining a secure and robust environment to store, synchronize and use the master data.
  • Finance and Performance management: Teradata implementation directly affects the company's finance management by improvising the speed and quality of financial reporting. It also reduces the cost of the financial infrastructure of the company and also proactively manages the enterprise performance.
  • Supply Chain Management: It efficiently improves supply chain management within the company and hence enhances customer service with reduced cycle times and lower inventories.
  • Demand Chain Management: It helps in improving customer service and satisfaction and also helps in predicting the demand for a particular product or service in the near future.

Jobs and Salaries for Teradata enthusiasts

There are a lot of jobs related to Teradata for experienced or learned individuals. Some of the related job roles are:

  • Data Engineer/SQL developer
  • Senior SQL Developer
  • Lead SQL Developer
  • ETL Test Analyst
  • Senior ETL/BI Tester
  • Business Analyst
  • Business System Analyst
  • Data warehouse Business Analyst
  • Business Intelligence Analyst
  • CRA Analyst
    etc.

The average salary of a Teradata Developer in the US is $108633 per annum, as of 2021.

Future Scope

Big Data and BI are the assured future of the IT industry. The problem++s of analyzing and handling large volumes of data is one of the real-world problems faced by companies. There are different tools and solutions that separately address structured and unstructured data.

Teradata is potentially equipped to approach all these problems as a unanimous platform. It is a holistic program to handle both structured and unstructured data.

Teradata is also one of the largest scalable platforms that are available in the market. Hence, many companies are adopting Teradata over other platforms.

The expanse of Teradata in the future seems large and promising.

Conclusion

As a part of the ever-expanding world, data is huge and ever-increasing. Every bit of data is valued whether in technical or non-technical industries. Many companies are involved in collecting and analyzing data to watch and predict future trends in every field. Teradata is one such robust platform developed, which encourages large volumes of data and provides the important tools and functions to operate on the same.

Explore Teradata Sample Resumes! Download & Edit, Get Noticed by Top Employers!

Course Schedule
NameDates
TeraData TrainingOct 08 to Oct 23View Details
TeraData TrainingOct 12 to Oct 27View Details
TeraData TrainingOct 15 to Oct 30View Details
TeraData TrainingOct 19 to Nov 03View Details
Last updated: 29 Jun 2024
About Author

Satish is a Data Management and Governance Practitioner with over 16 years of IT experience supporting enterprises to develop and optimize solutions that result in high value. He has an experience spanning across Data Warehouse ETL leveraging Informatica PowerCenter & Custom Tools, Databases like Teradata, Oracle, MS-SQL & Greenplum, Integrations design with MuleSoft, and Implemetation of Data Catalog, Governance and lineage applications such as Alation, Collibra & Manta at an Enterprise level. Satish was associated with GE, Dell & E2Open with successful implementation of Supply Chain integration solutions for several AMER & APJ region organisations. Currently, he is a Tech Lead at Dell within Information Governance space who plays a key role in setting up governance practice from grass-roots that traverse maturity through stewardship with a Business rules framework for data quality, metadata, and governance access capabilities. Satish is also actively involved in mentoring students at various levels from high school and Engineering as part of CSR by associating with various organizations such as FFE, to impart knowledge that would help gain insights on skills essential to be successful in their professional journey.

read less