If you're looking for Netezza Interview Questions for Experienced or Freshers, you are in right place. There are a lot of opportunities from many reputed companies in the world. According to research, Netezza has a market share of about 3.7%. So, You still have the opportunity to move ahead in your career in Netezza Analytics. Mindmajix offers Advanced Netezza Interview Questions 2021 that help you in cracking your interview & acquire a dream career as Netezza Analyst.
Below mentioned are the Top Frequently asked Netezza Interview Questions and Answers that will help you to prepare for the Netezza interview. Let's have a look at them.
The environment variables required are: NZ_HOST, NZ_DATABASE, NZ_USER, NZ_PASSWORD
The only constraint Netezza supports is Not null. There are no primary key and foreign key constraints in Netezza.
Yes. As there are no primary key constraints in Netezza you can insert duplicate rows.
|Do you want to Enrich your career with an IBM Netezza Training certified professional, then visit Mindmajix - A Global online training platform: “IBM Netezza Training” Course. This course will help you to achieve excellence in this domain.|
Specifying Not Null on each column in the table results in better performance. Netezza tracks the NULL values at row header level. Having NULL values results in storing references to NULL values in the header. If all columns are NOT NULL, then there is no record header.
While reading data from the disk, the Field Programmable Gate Array (FPGA) on each SPU filters out unwanted data. This process of data elimination removes IO bottlenecks and frees up downstream components such as the CPU, memory, and network from processing extra data.
A snippet is a small unit of work that is carried out in SPU.
An extent is the smallest unit of disk allocation on an SPU. Zone maps are internal mapping structures to the extent that take advantage of the internal ordering of data to eliminate extents that do not need to be scanned. Zone maps transparently avoid scanning of unreferenced rows. Zone maps are created for every column in the table and contain the minimum and maximum values for every extent.
Zone maps are created and refreshed for every SPU when you Generate statistics, Nzload operation, Insert, update operations, Nzreclaim operation.
A materialized view reduces the width (number of columns) of data being scanned in the base table by creating a thin version (fewer columns) of the base table that contains a small subset of frequently queried columns.
A materialized view has the same distribution key as the base table.
There are two partitioning methods available in Netezza:
You can specify up to four columns in the distribution clause.
Netezza distributes the data on the first column and it uses Hash partitioning.
No, the column that is used in the distribution clause cannot be used for updates.
Use Create Table As (CTAS) to redistribute the data in a table. While creating the new table specify the distribution on clause to distribute the data on the new columns.
CTAS will get a distribution from the original table.
To check the distribution of rows run the following query
SELECT data sliced, COUNT(*)FROM GROUP BY data sliced
When you join tables that are distributed on the same key and used these key columns in the join condition, then each SPU in Netezza works 100% independent of the other, as the required data is available in itself. This type of joins is called collocated joins.
Whenever it is not possible to do a collocated join, Netezza either redistributes the tables or broadcasts the table. When the table is a small one, then Netezza broadcasts the table. Otherwise, Netezza redistributes the table.
Whenever you delete a row in a table, it is not physically deleted. It is logically deleted by flagging the deleted field in the table. NZRECLAIM utility is used to remove the logically deleted records.
Nzload utility is used to load data from a file into a table. It is used to load bulk data quickly and simultaneously rejects erroneous content.
Netezza logically deletes the original row by flagging the deleted column with the current transaction id and inserts a new row with the updated values.
FPGA: Field Programmable Gate Array (FPGA) is located on each SPU. Netezza is different from other architectures. Netezza can do a “hardware upgrade” through software by using FPGA. Hardware is reconfigured during install.
While reading data from disk, FPGA on each SPU also helps in ‘filtering’ unnecessary data before getting loaded into memory on each SPU. This way, FPGA does not overwhelm all the data from the disk.
The zone map in Netezza is similar (concept-wise) to partitions in Oracle. Netezza maintains a map for data so that it does rely on a zone map to pull only the range it is interested in.
For example, if we need to pull out data from Jan 2009 till June 2009 from a table that is distributed on the date column, the zone map helps us to achieve this. The zone map is maintained by Netezza automatically, no user intervention needed. Zone mapping is done at a block (extent) level. Netezza has zone maps for all columns (not just distributed columns) and includes information such as minimum, the maximum, total number of records.
Sort data first, based on historical data (for example, date), and load this in using nzload.
/../ibm-netezza" target="_blank" rel="noopener">Check Out Netezza Tutorials
Typically only schema and other database objects are cached in appliances. Data is not cached, in general. In most cases, data is not saved anywhere (in any cache or on the host computer) and is streamed directly from SPU to client software.
Obviously, it all depends. This is my (limited) view:
* Teradata: 72 nodes (two quad-core CPUs, 32GB RAM,104 / 300GB disks per node) and manages 2.4PB.
* Greenplum: Fox Interactive Media using a 40-node, Sun X4500 with two dual-core CPUs, 48 / 500GB disks, and 16 GB RAM (1PB total disk space)
Source: Vertica’s Michael Stonebraker!
Loads bypass a few steps that typically a query would go through (a query goes through plan generation, optimization, and transaction management). Loads are done in terms of “sets” and this set is based on underlying table structure (thus loads for two different tables are different as their sets are based on table structures). Data is processed to check the format and distribution of records calculated very quickly (in one step), fills into the ‘set’ structure, and writes to the storage structure. Storage also performs space availability and other admin tasks, all these operations go pretty quick (think of them as UNIX named pipes that streams data, and SPU stores these records).
Very rarely a driver may return aggregated results that are still getting processed back to the client. In this case, the client may assume that the calculation is complete, instead of updating with the latest or final results. Obviously, the driver has to wait for Netezza to complete operation on the host computer, before delivery.
Data is stored based on a selected field(s) that are used for distribution.
==Data (A)==> Hash Function (B) ==> Logical SPU identifier list (C) ==> Physical SPU list (D) ==> Storage (E)
When data arrives, it is hashed based on the field(s) and a hash function (B) is used for this purpose.
For example, for a hypothetical 32 node system, the logical SPU identifier list has 32 unique entries. If there are 1000 hashed data items from (B), there are 1000 entries in (C), all having only 32 SPU entries (a number of data items go to the same SPU, thus multiple (B) entries map to the same (C)). For instance, (C) has values [3,19,30,7,20,25,11,3,22,19….]. This way, 1000 data entries are mapped. (D) has a physical IP address of both primary and failover SPU. If there is a failover, this is the only place where Netezza needs to update its entries. The same goes for a system that has a new SPU added. It is a little complicated, in principle, this is the concept.
Environment variables: NZ_HOST, NZ_DATABASE, NZ_USER, and NZ_PASSWORD
In case of conflict in which the same record is set for modification, Netezza rolls back a recent transaction that is attempted on the same record, in fact, the same table. This is generally acceptable in DW environments. Netezza does support serialization transactions and does not permit dirty reads.
Netezza does not update records in place, it marks records with a delete flag. In fact, each record contains two slots, one for create Xid another for delete xid. Delete xid allows us to mark a record with a current transaction for deletion, up to 31 transactions are allowed in Netezza for all tables. As noted earlier, only one update at a time allowed on the same table though. Here update refers to transactions that are not committed yet. Coming back to delete xid, this is how Netezza maintains transaction rollback and recovery. Once a record is modified, its delete xid is given transaction id; this is changed from the previous value of 0, all records when loaded will contain 0 for delete xid. Note that FPGA uses its intelligence to scan data before delivering them to host or applications.
[ROW id][Create xid][Delete xid]
[R1][T1] // First time a record is loaded, record R1
// After some time, updating the same record
[R1][T1][T33] // Record R1 is updated; note T33
[R33][T33] // New update record R33; similar to a new record this has zero for Delete Xid
If the record is deleted, simply deletion xid will contain that transaction id.
They are logically deleted and the administrator can run nzreclaim, we may also truncate the table.
In Netezza, a public group is created automatically and everyone is a member of this group by default. We can create as many groups and any user can be a member of any group(s). Group can not be a member of another group. Group names, user names, and database names are unique. That is, we can not have a database called sales and a group also called sales.
Login into the system database and give that permission to the user by saying “grant create a table to joe;”
List. Grant list, select on the table to the public (if logged into sales database, this allows all users to query tables in sales database).
No, the drop database will take care of it.
Not null and default. Netezza does not apply to PK and FK.
Specifying not null results in better performance as NULL values are tracked at row header level. Having NULL values results in storing references to NULL values in the header. If all columns are NOT NULL, then there is no record header.
Response: Newly created table from CTAS gets distribution from the original table.
Just empties data from the table, keeping table structure, and permission intact.
The first column (same as in Teradata).
No, the column that is used in the distribution clause cannot be used for updates. Remember, up to four columns can be used for the distribution of data on SPU. From a practical sense, updating distribution columns result in the redistribution of data; the single most performance hit when a large table is involved. This restriction makes sense.
Zone maps work best for integer data types.
Of course, a large list, especially when compared to Oracle. PK and FK enforcement is a big drawback though this is typically enforced at ETL or ELT process [ELT: Extract, Transform, and Load. Note that ‘Transform’ and ‘Load’ can happen within Netezza].
Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.