Ab Initio Interview Questions

The purpose of this blog is to prepare you for the Ab Initio interview. Our experts have come up with the top 50 Ab Initio interview questions that job applicants are likely to face, covering all of the important areas that recruiters look for.

Rating: 4.5
72381

If you're looking for Ab Initio Interview Questions for Experienced or Freshers, you are at the right place. There are a lot of opportunities from many reputed companies in the world. According to research Ab Initio has a market share of about 2.2%.

So, You still have the opportunity to move ahead in your career in Ab Initio Development. Mindmajix offers Advanced Ab Initio Interview Questions 2024 that helps you in cracking your interview & acquire your dream career as Ab Initio Developer.

We have categorized Ab Initio Interview Questions into 3 levels they are:

Top 10 Frequently Asked Ab Initio Interview Questions

  1. What is the relation between eme, gde and co-operating system?
  2. What are the benefits of data processing according to you?
  3. How data is processed and what are the fundamentals of this approach?
  4. What would be the next step after collecting the data?
  5. What is a data processing cycle and what is its significance?
  6. What are the factors on which storage of data depends?
  7. What do you mean by data sorting?
  8. What are the benefits of data analysis?
  9. What do you mean by the overflow errors?
  10. What is common between data validity and Data Integrity?

Ab Initio Interview Questions and Answers For Freshers

1. Informatica vs Ab Initio

FeatureAB InitioInformatica
About ToolCode-based ETLEngine based ETL
ParallelismSupports One Types of parallelismSupports three types of parallelism
SchedulerNo schedulerSchedule through script available
Error HandlingCan attach error and reject filesOne file for all
RobustRobustness by function comparisonBasic in terms of robustness
FeedbackProvides performance metrics for each component executedDebug mode, but slow implementation
Delimiters while readingSupports multiple delimitersOnly dedicated delimiter

2. What is the relation between eme, gde and co-operating system?

Eme is said as enterprise metadataenv, gde is graphical development env, and the co-operating system can be said as ab initio server relation b/w this co-op, eme and gde is as follows operating system is the abinitio server. This co-op is installed on a particular o.s platform that is called native o.s .coming to the eme, it's just like a repository in Informatica, its hold the metadata, transformations, dbconfig files source and targets information.

Coming to gde its is an end-user environment where we can develop the graphs (mapping just like in Informatica) designer uses the gde and designs the graphs and save them to the eme or sandbox it is at the user side. Where eme is at server side.

3. What are the benefits of data processing according to you? 

Well, the processing of data derives a very large number of benefits. Users can put separate many factors that matter to them. In addition to this, with the help of this approach, one can easily keep up the pace simply by deriving data into different structures from a totally unstructured format.

In addition to this, the processing is useful in eliminating various bugs that are often associated with the data and cause problems in a later section. It is because of no other reason than this, data processing has wide application in a number of tasks.

4. What exactly do you understand with the term data processing and businesses can trust this approach?

Processing is basically a procedure that simply covert the data from a useless form into a useful one without making a lot of effort. However, the same may vary depending on factors such as the size of the data and its format. A sequence of operations is generally carried out to perform this task and depending on the type of data, this sequence could be automatic or manual.

Because in the present scenario, most of the devices that perform this task are PC’s automatic approach is more popular than ever before. Users are free to obtain data in forms such as a table, vectors, images, graphs, charts, and so on. These are the best things that business owners can simply enjoy.

5. How data is processed and what are the fundamentals of this approach?

There are certain activities that require the collection of data and the best thing is processing largely depends on the same in many cases. The fact is data needs to be stored and analyzed before it is actually processed. This task depends on some major factors they are 

  • Collection of Data
  • Presentation 
  • Final Outcomes
  • Analysis
  • Sorting.

These are also regarded as the basic fundamentals that can be trusted to keep up the pace in this matter.

If you want to enrich your career and become a professional in ETL Tool, then enroll in "ETL Testing Training". This course will help you to achieve excellence in this domain.

6. What would be the next step after collecting the data?

Once the data is collected, the next important task is to enter it into the concerned machine or system. Well, gone are those days when storage depends on papers. At the present time, the data size is very large and it needs to be performed in a reliable manner.

A digital approach is a good option for this as it simply lets users perform this task easily and in fact without compromising anything. A large set of operations then need to be performed for the meaningful analysis. In many cases, the conversion also largely matters and users are always free to consider the outcomes which best meet their expectations.

7. What is a data processing cycle and what is its significance?

Data often needs to be processed continuously and it is used at the same time. It is known as the data processing cycle. The same provides results that are quick or may take extra time depending on the type, size, and nature of the data.

This is boosting the complexity of this approach and thus there is a need for methods that are more reliable and advanced than existing approaches. The data cycle simply makes sure that complexity can be avoided up to the possible extent and without doing much.

MindMajix Youtube Channel

8. What are the factors on which the storage of data depends?

Basically, it depends on sorting and filtering. In addition to this, it largely depends on the software one uses. 

9. Do you think effective communication is necessary for data processing? What is your strength in terms of the same?

The biggest ability that one could have in this domain is the ability to rely on the data or the information. Of course, communication matters a lot in accomplishing several important tasks such as the representation of information. There are many departments in an organization and communication makes sure things are good and reliable for everyone. 

10. Suppose we assign you a new project. What would be your initial point and the key steps that you follow?

The first thing that largely matters is defining the objective of the task and then engaging the team in it. This provides a solid direction for the accomplishment of the task. This is important when one is working on a set of data that is completely unique or fresh. After this, the next big thing that needs attention is effective data modeling. This includes finding the missing values and data validation. Last thing is to track the results.

11. Suppose you find the term Validation mentioned with a set of data, what does that simply represent?

It represents that the concerned data is clean, and correct, and can thus be used reliably without worrying about anything. Data validation is widely regarded as the key point in the processing system.

12. What do you mean by data sorting?

It is not always necessary that data remains in a well-defined sequence. In fact, it is always a random collection of objects. Sorting is nothing but arranging the data items in desired sets.

13. Name the technique which you can use for combining the multiple data sets simply?

It is known as Aggregation.

[ Related Article: Ab Initio vs Informatica ]

14. How scientific data processing is different from commercial data processing?

Scientific data processing simply means data with a great amount of computation i.e. arithmetic operations. In this, a limited amount of data is provided as input, and bulk data is there at the outcome. On the other hand, commercial data processing is different. In this, the outcome is limited as compared to the input data. The computational operations are limited in the commercial data processing.

Ab Initio Scenario-Based Interview Questions and Answers

15. What are the benefits of data analysis?

It makes sure of the following:

  • Explanation of development related to the core tasks can be assured
  • Test Hypotheses with an integration approach is always there
  • Pattern detection in a reliable manner.

16. What are the key elements of a data processing system?

These are Converter, Aggregator, Validator, Analyzer, Summarizer, and a sorter.

17. Name any two stages of the data processing cycle and provide your answer in terms of a comparative study of them?

The first is Collection and the second one is the preparation of data. Of course, the collection is the first stage and preparation is the second in a cycle dealing with data processing. The first stage provides a baseline to the second and the success and simplicity of the first depends on how accurately the first has been accomplished. Preparation is mainly the manipulation of important data. The collection breaks data sets while Preparation joins them together.

[ Check out: Open Source ETL Tools ]

18. What do you mean by the overflow errors?

While processing data, calculations that are bulky are often there and it is not always necessary that they fit the memory allocated for them. In case a character of more than 8-bits is stored there, this error results simply.

19. What are the facts that can compromise data integrity?

There are several errors that can cause this issue and can transform many other problems. These are:

  • Bugs and malware
  • Human error
  • Hardware error
  • Transfer errors generally include data compression beyond a limit. 

20. What is data encoding?

Data needs to be kept confidential in many cases and it can be done through this approach. It simply makes sure of information remains in a form that no one else than the sender and the receiver can understand.

21. What does EDP stand for?

 It means Electronic Data Processing.

22. Name one method which is generally considered by remote workstations when it comes to processing

Distributed processing

23. What do you mean by a transaction file and how it is different from that of a Sort file?

The Transaction file is generally considered to hold input data and that is for the time when a transaction is under process. All the master files can be updated with it simply. Sorting is done to assign a fixed location to the data files on the other hand. 

24. What is the use of aggregation when we have rollup as we know rollup component in ab initio is used to summarize a group of the data records. Then where we will use we aggregation?

Aggregation and Rollup both can summarize the data but rollup is much more convenient to use. In order to understand how a particular summarization being rollup is much more explanatory compared to aggregate. Rollup can do some other functionality like input and output filtering of records. Aggregate and rollup perform the same action, rollup displays the intermediate result in main memory, Aggregate does not support intermediate result.

25. What are kinds of layouts does ab initio supports?

Basically, there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is defined such as it’s same as the degree of parallelism.

26. How do you add default rules in the transformer?

Double click on the transform parameter of the parameter tab page of component properties, it will open transform editor. In the transform, the editor clicks on the Edit menu and then select Add Default Rules from the dropdown. It will show two options – 1) Match Names 2) Wildcard.

27. Do you know what a local lookup is?

If your lookup file is multifile and partitioned/sorted on a particular key then the local lookup function can be used ahead of the lookup function call. This is local to a particular partition depending on the key.
Lookup File consists of data records that can be held in the main memory.

This makes the transform function to retrieve the records much faster than retrieving from disk. It allows the transform component to process the data records of multiple files fast.

28. What is the diff b/w look-up file and look-up, with a relevant example?

Generally, the Lookup file represents one or more serial files (Flat files). The amount of data is small enough to be held in the memory. This allows transform functions to retrieve records much more quickly than it could retrieve from Disk.

29. How many components are in your most complicated graph?

It depends on the type of components you use. Usually, avoid using many complicated transform functions in a graph.

30. Have you worked with packages?

Multistage transform components by default use packages. However, the user can create his own set of functions in a transfer function and can include this in other transfer functions.

31. Can be sorting and storing be done through single software or do you need different for these approaches?

Well, it actually depends on the type and nature of the data. Although it is possible to accomplish both these tasks through the same software, many software has their own specialization and it would be good if one adopts such an approach to get the quality outcomes.

There are also some pre-defined set of modules and operations that largely matters. If the conditions imposed by them are met, users can perform multiple tasks with similar software. The output file is provided in various formats.

Ab Initio Interview Questions For Experienced

32. What are the different forms of output that can be obtained after processing data?

These are

  • Tables
  • Plain Text files
  • Image files
  • Maps
  • Charts
  • Vectors
  • Raw files

Sometimes data is required to be produced in more than one format and therefore the software accomplishing this task must have features available in it to keep up the pace in this matter.

33. Give one reason when you need to consider multiple data processing?

When the required files are not the complete outcomes that are required and need further processing.

34. What are the types of data processing you are familiar with?

The very first one is the manual data approach. In this, the data is generally processed without the dependency on a machine and thus it contains several errors. In the present time, this technique is not generally followed or only limited data is to proceed with this approach.

The second type is Mechanical data processing. Mechanical devices have some important roles in it this approach. When the data is a combination of different formats, this approach is adopted. The next approach is Electronic data processing which is regarded as the fastest and is widely adopted in the current scenario. It has top accuracy and reliability. 

35. Name the different types of processing based on the steps that you know about?

They are:

  • Real-Time processing
  • Multiprocessing
  • Time-Sharing
  • Batch processing
  • Adequate Processing

36. Why do you think data processing is important?

The fact is data is generally collected from different sources. Thus, the same may vary largely in a number of terms. The fact is this data needs to be passed from various analyses and other processes before it is stored. This process is not as easy as it seems in most cases.

Thus, processing matters. A lot of o time can be saved by processing the data to accomplish various tasks that largely matter. The dependency on the various factors for the reliable operation can also be avoided by to a good extent. 

37. What is common between data validity and Data Integrity?

Both these approaches deal with errors related to errors and make sure of smooth flow of operations that largely matters.

38. What do you mean by the term data warehousing? Is it different from Data Mining?

Many times there is a need to have data retrieval, warehousing can simply be considered to assure the same without affecting the efficiency of operational systems. It simply supports decision support and always works in addition to the business applications and Customer Relationship Management and warehouse architecture. Data mining is closely related to this approach. It assures simple findings of required operators from the warehouse.

39. What exactly do you know about the typical data analysis?

It generally involves the organization as well as the collection of important files in the form of important files. The main aim is to know the exact relationship between the industrial data or the full data and the one which is analyzed. Some experts also call it one of the best available approaches to find errors. It entails the ability to spot problems and enable the operator to find out the root causes of the errors.

40. Have you used the rollup component? Describe how?

If the user wants to group the records on particular field values then rollup is the best way to do that. Rollup is a multi-stage transform function and it contains the following mandatory functions.

  1. Initialize
  2. Rollup
  3. Finalize

Also need to declare one temporary variable if you want to get counts of a particular group.

For each of the groups, first, it does call the initialize function once, followed by rollup function calls for each of the records in the group, and finally calls the finalize function once at the end of the last rollup call.

41. How to add default rules in the transformer?

Add Default Rules — Opens the Add Default Rules dialog. Select one of the following: Match Names — Match names: generates a set of rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule — Generates one rule that copies input fields to output fields with the same name.

  1. If it is not already displayed, display the Transform Editor Grid.
  2. Click the Business Rules tab if it is not already displayed.
  3. Select Edit > Add Default Rules.

In the case of reformat if the destination field names are the same or a subset of the source fields then no need to write anything in the reformat xfr unless you don't want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achieve the functionality.

42. What is the difference between partitioning with key and round-robin?

Partition by Key or hash partition ->This is a partitioning technique that is used to partition data when the keys are diverse. If the key is present in large volume then there can large data skew? But this method is used more often for parallel data processing.

Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is zero in this case when no of records is divisible by a number of partitions. A real-life example is how a pack of 52 cards is distributed among 4 players in a round-robin manner.

43. How do you improve the performance of a graph?

There are many ways the performance of the graph can be improved.

  1. Use a limited number of components in a particular phase
  2. Use an optimum value of max core values for sort and join components
  3. Minimize the number of sort components
  4. Minimize sorted join components and if possible replace them by in-memory join/hash join
  5. Use only required fields in the sort, reformat, join components
  6. Use phasing/flow buffers in case of a merge, sorted joins
  7. If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port
  8. For large datasets don’t use broadcast as a partitioner
  9. Minimize the use of regular expression functions like re_index in the transfer functions
  10. Avoid repartitioning of data unnecessarily.

Try to run the graph as long as possible in MFS. These input files should be partitioned and if possible output file should also be partitioned.

44. How do you truncate a table?

From Abinitio run SQL component using the DDL “truncate table by using the truncate table component in Ab Initio

45. Have you ever encountered an error called “depth not equal”?

When two components are linked together if their layout does not match then this problem can occur during the compilation of the graph. A solution to this problem would be to use a partitioning component in between if there was a change in layout.

46. What are primary keys and foreign keys?

In RDBMS the relationship between the two tables is represented as the Primary key and foreign key relationship. Whereas the primary key table is the parent table and the foreign key table is the child table. The criteria for both the tables are there should be a matching column.

 Explore Ab Initio Sample Resumes Download & Edit, Get Noticed by Top Employers!  

47. What is an outer join?

An outer join is used when one wants to select all the records from a port – whether it has satisfied the join criteria or not.

About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less