Data Modeling Interview Questions

Data models let designers, programmers, and end-users communicate with each other. Data modeling is the process of managing and analyzing data across a wide range of companies. Database engineers, business analysts, and programmers are just a few of the professions that work with data models on a regular basis. These data modeling interview questions and example responses can help you feel more prepared for your next data modeling job application process.

Rating: 4.6

If you're looking for Data Modeling Interview Questions for Experienced or Freshers, you are in right place. There are a lot of opportunities from many reputed companies in the world. According to research Data Modeling has a market share of about 15.43%. So, You still have the opportunity to move ahead in your career in Data Modeling Architecture.

Mindmajix offers Advanced Data Modeling Interview Questions 2024 that helps you in cracking your interview & acquire a dream career as a Data Modeling Architect.

If you want to enrich your career and become a professional in Data Modeling, then visit Mindmajix - a global online training platform: "Data Modelling Training" This course will help you to achieve excellence in this domain.

Below mentioned are the Top Frequently asked Data Modeling Interview Questions and Answers that will help you to prepare for the Data Modeling interview. Let's have a look at them.

Frequently Asked Data Modeling Interview Questions

  1. What is Data Modeling?
  2. What does the Data Model contain?
  3. What are a logical data model and logical data modeling?
  4. What are a physical data model and physical data modeling?
  5. What is the difference between a logical and physical data model?
  6. What is a primary key constraint?
  7. What is a foreign key constraint?
  8. What are the important types of Relationships in a data model?

Top Data Modeling Interview Questions and Answers

Q1) What is data modeling?

A data model is a conceptual representation of business requirements (logical data model) or database objects (physical) required for a database and is very powerful in expressing and communicating the business requirements and database objects. The approach by which data models are created is called data modeling.

For More Info: What is data modeling?

Q2) What does the data model contain?

Logical Data Model: Entity, Attributes, Super Type, Sub Type, Primary Key, Alternate Key, Inversion Key Entry, Rule, Relationship, Definition, business rule, etc

Physical Data Model: Table, Column, Primary key Constraint, Unique Constraint or Unique Index, Non-Unique Index, Check Constraint, Default Value, Foreign Key, comment, etc.

Q3) What are a logical data model and logical data modeling?

A logical data model is the version of a data model that represents the business requirements (entire or part of an organization). This is the actual implementation and extension of a conceptual data model.

Logical Data Models contain Entity, Attributes, Super Type, Sub Type, Primary Key, Alternate Key, Inversion Key Entry, Rule, Relationship, Definition, etc. The approach by which logical data models are created is called logical data modeling.

Related Article: Big Data Modeling

Q4) What are a physical data model and physical data modeling?

The physical data model includes all required tables, columns, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, and physical storage are important parameters of a physical model.

The important or main object in a database is a table that consists of rows and columns. The approach by which physical data models are created is called physical data modeling.

Q5) What is the difference between a logical and physical data model?

When a data modeler works with the client, his title may be a logical data modeler or a physical data modeler, or a combination of both.

A logical data modeler designs the data model to suit business requirements, creates and maintains the lookup data, compare the versions of the data model, maintains a changelog, generate reports from the data model and whereas a physical data modeler has to know about the source and target databases properties.

A physical data modeler should know the technical know-how to create data models from existing databases and tune the data models with referential integrity, alternate keys, indexes, and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering, and so on.

Related Article: Examples of Data Modelling

Q6) What is a table (entity)?

Data stored in form of rows and columns is called a table. Each column has a datatype and based on the situation, integrity constraints are enforced on columns.

Q7) What is a column (attribute)?

A column also known as a field is a vertical alignment of the data and contains related information to that column.

Q8) What is a row?

The row is also known as a tuple or record is the horizontal alignment of the data.

 Subscribe MindMajix YouTube Channel

Q9) What is the ER (entity relationship) diagram or ERD?

ER diagram is a visual representation of entities and the relationships between them. In a data model, entities (tables) look like square boxes or rectangular boxes, which contain attributes, and these entities, are connected by lines (relationship).

Q10) What is a primary key constraint?

The primary key constraint is imposed on the column data to avoid null values and duplicate values. Primary Key=Unique + Not Null. Example: social security number, bank account number, bank routing number

Q11) What is a composite primary key constraint?

When more than one column is a part of the primary key, it is called a composite primary key constraint.

Q12) What is a surrogate key?

In normal practice, a numerical attribute is enforced as a primary key which is called a surrogate key. A surrogate key is a substitute for natural keys.

Instead of having a primary key or composite primary keys, the data modelers create a surrogate key; this is very useful for creating SQL queries, uniquely identify a record, and good performance.

Q13) What is a foreign key constraint?

The parent table has a primary key and a foreign key constraint is imposed on a column in the child table. The foreign key column value in the child table will always refer to primary key values in the parent table.

Q14) What is a composite foreign key constraint?

When a group of columns is in a foreign key, it is called a composite foreign key constraint.

Related Article: Data Modelling Tools

Q15) What are the important types of Relationships in a data model?

Identifying, Non-Identifying Relationship, Self-Recursive relationship are the types of relationship.

Q16) What is an identifying relationship?

Usually, in a data model, parent tables and child tables are present. The parent table and child table are connected by a relationship line.

If the referenced column in the child table is a part of the primary key in the child table, the relationship is drawn by thick lines by connecting these two tables, which is called an identifying relationship.

Q17) What is the non-identifying relationship?

Usually, in a data model, parent tables and child tables are present. The parent table and child table are connected by a relationship line.

If the referenced column in the child table is not a part of the primary key and standalone column in the child table, the relationship is drawn by dotted lines by connecting these two tables, which is called a non-identifying relationship.

Q18) What is a self-recursive relationship?

A standalone column in a table will be connected to the primary key of the same table, which is called a recursive relationship.

Q19) What is cardinality?

One to One, One to many, and many to many are different types of cardinalities. In a database, high cardinality means more unique values are stored in a column and vice versa.

Q20) What are a conceptual data model and conceptual data modeling?

The conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the initial planning phase. Data Modelers create a conceptual data model and forward that model to the functional team for their review.

The approach by which conceptual data models are created is called conceptual data modeling.

Q21) What is an enterprise data model?

The Enterprise data model comprises all entities required by an enterprise. The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. For better understanding purposes, these data models are split up into subject areas.

Q22) What is relational data modeling?

The visual representation of objects in a relational database (usually normalized) is called relational data modeling. The table contains rows and columns.

Q23) What is OLTP data modeling?

OLTP acronym stands for ONLINE TRANSACTIONAL PROCESSING. The approach by which data models are constructed for transactions is called OLTP data modeling. Example: all online transactions, bank transactions, trading transactions.

Related Article: What is OLTP?

Q24) What is a constraint? What are the different types of constraints?

The constraint is a rule imposed on the data.  The different types of constraints are primary key, unique, not null, foreign key, composite foreign key, check constraint, etc.

Q25). What is a unique constraint?

Unique constraint is imposed on the column data to avoid duplicate values, but it will contain NULL values.

Q26). How many null values can be inserted in a column that has a unique constraint?

Many null values can be inserted in a unique constraint column because one null value is not equal to another null value.

Q27) What is a check constraint?

Check constraint is used to check the range of values in a column.

Q28) What is an index?

The index is imposed on a column or set of columns for the fastest retrieval of data.

Q29) What is the sequence?

The sequence is a database object to generate a unique number.

Data Modeling Interview Questions for Experienced

Q30) What is normalization?

E.F. Codd gave some rules to design relational databases and the rules were focused on removing data redundancy which helps to overcome normal data modeling problems. The process of removing data redundancy is known as normalization.

Q31) What are the types of normalization?

First normal form, Second normal form, third normal forms are three types of normalization used in practice. Beyond these normal forms, Boyce-Codd fourth and fifth normal forms are also available.

Q32) What is De-Normalization?

De-Normalization is a process of adding redundancy to the data. This helps to quickly retrieve the information from the database.

Q33) What is the data model Metadata?

You can take a report of the entire data model, or subject, or part of the data model. The data about various objects in the data model is called data model Metadata. Data Modeling Tools have options to create reports by checking various options. Either you can create a logical data model Metadata of physical model Metadata.

Q34) What is a data model repository?

Data Model and its relevant data like entity definition, attribute definition, columns, data types, etc. are stored in a repository, which can be accessed by data modelers and the entire team.

Q35) What is forward engineering in a data model?

Forward Engineering is a process by which DDL scripts are generated from the data model. Data modeling tools have options to create DDL scripts by connecting with various databases. With these scripts, databases can be created.

Q36) What is reverse engineering in a data model?

Reverse Engineering is a process useful for creating data models from databases or scripts. Data modeling tools have options to connect to the database by which we can reverse engineer a database into a data model.

Q37) What are subtype and supertype entities?

An entity can be split into many entities (sub-entities) and grouped based on some characteristics and each sub-entity will have attributes relevant to that entity. These entities are called subtype entities. The attributes which are common to these entities are moved to a super (higher) level entity, which is called a supertype entity.

Q38) Give us a non-computer example of preemptive and non-preemptive scheduling

Consider any system where people use some kind of resources and compete for them. The non-computer examples for preemptive scheduling the traffic on the single-lane road if there is an emergency or there is an ambulance on the road the other vehicles give a path to the vehicles that are in need.

An example of preemptive scheduling is people standing in a queue for tickets.

Q39) What is the difference between star flake and snowflake schema?

Star Schema: Well in star schema you just enter your desired facts and all the primary keys of your dimensional tables in the Fact table. And fact tables primary is the union of its all dimension table key. In a star schema, dimensional tables are usually not in BCNF form.

Snow Flake: It's almost like star-schema but in this, our dimension tables are in 3rd NF, so more dimensions tables. And these dimension tables are linked by primary, foreign key relations.

Q40) What is data sparsity and how it affects aggregation?

Data sparsity is a term used for how much data we have for a particular dimension/entity of the model. It affects aggregation depending on how deep the combination of members of the sparse dimension makes up.

If the combination is a lot and those combinations do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.

Q41) What is the difference between the hashed file stage and sequential file stage in relates to DataStage Server?

In Datastage server jobs, can we use a sequential file stage for a lookup instead of a hashed file stage? If yes, then what’s the advantage of a Hashed File stage over a sequential file stage.

Search is faster in hash files as you can directly get the address of record directly by the hash algorithm as records are stored like that but in the case of a sequential file, u must compare all the records.

Q42) When should you consider denormalization?

Denormalization is used when there are a lot of tables involved in retrieving data. Denormalization is done in dimensional data modeling used to construct a data warehouse. This is not usually done for databases of transactional systems.

Q43) What is ERD?

Data models are tools used in the analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC.

There are three basic elements in ER models:

  1. Entities are the “things” about which we seek information.
  2. Attributes are the data we collect about the entities.
  3. Relationships provide the structure needed to draw information from multiple entities

Q44) What is the third normal form?

An entity is in the third normal form if it is in the second normal form and all of its attributes are not transitively dependent on the primary key. Transitive dependence means that descriptor key attributes depend not only on the whole primary key but also on other descriptor key attributes that, in turn, depend on the primary key.

In SQL terms, the third normal form means that no column within a table is dependent on a descriptor column that, in turn, depends on the primary key.

For 3NF, first, the table must be in 2NF, plus, we want to make sure that the non-key fields are dependent upon ONLY the PK, and not other non-key fields for its existence. This is very similar to 2NF, except that now you are comparing the non-key fields to OTHER non-key fields. After all, we know that the relationship to the PK is good because we established that in 2NF.

Q45) Why are recursive relationships are bad? How do you resolve them?

Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters, such as one-to-one, one-to-many, and many-to-many. A recursive relationship occurs when there is a relationship between an entity and itself.

For example, a one-to-many recursive relationship occurs when an employee is the manager of another employee. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager).

Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a style sheet.

Q46) Is this statement TRUE or FALSE? all databases must be in third normal form?

In general, all organization databases are normalized to 3nf in order to remove redundancy and efficient access. A database can also be created without normalization. Hence it is not mandatory that a database should be in 3nf.

Q47) What is an artificial (derived) primary key? When should it be used?

Using a name as the primary key violates the principle of stability. The social security number might be a valid choice, but a foreign employee might not have a social security number. This is a case where a derived, rather than a natural, primary key is appropriate.

A derived key is an artificial key that you create. A natural key is one that is already part of the database.

Q48) What is the second normal form?

An entity is in the second normal form if all of its attributes depend on the whole (primary) key. In relational terms, every column in a table must be functionally dependent on the whole primary key of that table. Functional dependency indicates that a link exists between the values in two different columns.

If the value of an attribute depends on a column, the value of the attribute must change if the value in the column changes. The attribute is a function of the column. The following explanations make this more specific:

If the table has a one-column primary key, the attribute must depend on that key.

If the table has a composite primary key, the attribute must depend on the values in all its columns taken as a whole, not on one or some of them.

If the attribute also depends on other columns, they must be columns of a candidate key; that is, columns that are unique in every row.

If you do not convert your model to the second normal form, you risk data redundancy and difficulty in changing data. To convert first-normal-form tables to second-normal-form tables, remove columns that are not dependent on the primary key.

Course Schedule
Data Modeling TrainingJun 01 to Jun 16View Details
Data Modeling TrainingJun 04 to Jun 19View Details
Data Modeling TrainingJun 08 to Jun 23View Details
Data Modeling TrainingJun 11 to Jun 26View Details
Last updated: 02 Jan 2024
About Author

Ravindra Savaram is a Technical Lead at His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less