Find Interview Questions for Top Companies
Ques:- Difference between ETL and data mining, Types of Fact Table according to dimensional value, What is Factless Fact Table, Types of Dimension.
Right Answer:
**Difference between ETL and Data Mining:**
ETL (Extract, Transform, Load) is the process of collecting data from various sources, transforming it into a suitable format, and loading it into a data warehouse. Data mining, on the other hand, involves analyzing large datasets to discover patterns, trends, and insights.

**Types of Fact Table according to Dimensional Value:**
1. Transaction Fact Table
2. Periodic Snapshot Fact Table
3. Accumulating Snapshot Fact Table

**What is a Factless Fact Table:**
A Factless Fact Table is a fact table that does not contain any measurable facts or numeric data but captures events or conditions, often used to track many-to-many relationships or specific occurrences.

**Types of Dimension:**
1. Conformed Dimension
2. Slowly Changing Dimension (SCD)
3. Junk Dimension
4. Role-Playing Dimension
5. Degenerate Dimension
Ques:- Describe how you would ensure that database object definitions (Tables, Indices, Constraints, Triggers, Users, Logins, Connection Options, and Server Options etc) are consistent and repeatable between multiple database instances (i.e.: a test and producti
Right Answer:
To ensure that database object definitions are consistent and repeatable between multiple database instances, I would implement the following practices:

1. **Version Control**: Use a version control system (e.g., Git) to manage database schema scripts and changes.
2. **Database Migration Tools**: Utilize migration tools (e.g., Flyway, Liquibase) to apply schema changes in a controlled manner across environments.
3. **Infrastructure as Code**: Define database configurations and objects in code using tools like Terraform or AWS CloudFormation.
4. **Automated Deployment**: Set up CI/CD pipelines to automate the deployment of database changes to different environments.
5. **Documentation**: Maintain clear documentation of the database schema and changes to ensure all team members are aligned.
6. **Environment Configuration**: Use configuration files to manage environment-specific settings, ensuring consistency across instances.
7. **Testing**: Implement automated tests to validate database changes before deploying to production.
Ques:- IN A ROOM THERE IS ONE BULB OUTSIDE F THE ROOM THERE ARE THREE SWITCHES ONLY ONE SWITCH IS CONNECTED THE BULB.YOU ARE ALLOWED TO STAND NEAR THE SWITCH FOR 5 MINUTES WHAT EVER YOU DO THE SWITCH.AFTER 5 MINUTES ENTER THE ROOM AND TO SAY WHICH IS THE CORRECT
Right Answer:
Turn on the first switch and leave it on for 5 minutes. After 5 minutes, turn it off and turn on the second switch. Enter the room:

- If the bulb is on, the second switch is the correct one.
- If the bulb is off but warm, the first switch is the correct one.
- If the bulb is off and cold, the third switch is the correct one.
Ques:- What is a mapping, session, worklet, workflow, mapplet?
Right Answer:
- **Mapping**: A mapping defines the data flow from source to target, including transformations applied to the data.
- **Session**: A session is a set of instructions that the Informatica server uses to move data from source to target based on the mapping.
- **Worklet**: A worklet is a set of tasks that can be reused in workflows, allowing for modular design and execution of related tasks.
- **Workflow**: A workflow is a set of tasks that define the execution order of sessions and other operations in the ETL process.
- **Mapplet**: A mapplet is a reusable set of transformations that can be used in multiple mappings to simplify design and promote consistency.
Ques:- What is the difference between etl tool and olap tools
Right Answer:
ETL tools are used to extract, transform, and load data from various sources into a data warehouse, while OLAP tools are used for analyzing and querying that data to support decision-making and reporting.
Ques:- What are the various methods of getting incremental records or delta records from the source systems?
Right Answer:
1. **Timestamps**: Use a timestamp column to identify records that have been added or modified since the last extraction.

2. **Change Data Capture (CDC)**: Utilize database features or tools that track changes to data in real-time.

3. **Triggers**: Implement database triggers to log changes in a separate table for later extraction.

4. **Versioning**: Maintain a version number for records and extract records with a higher version than the last extraction.

5. **Batch Processing**: Extract data in batches based on a defined interval, comparing the current batch with the previous one.

6. **Log Files**: Analyze transaction logs to identify changes made to the data since the last load.
Ques:- Where do we use connected and un connected lookups
Right Answer:
Connected lookups are used when you need to return multiple columns from the lookup table and want to pass the output directly to the data flow. Unconnected lookups are used when you only need a single value from the lookup table and want to call the lookup as a function, allowing for more flexibility in the data flow.
Ques:- I'am new to SAS, can anybody explain the process of extracting data from source systems,storing in ODS and how data modelling is done.
Right Answer:
To extract data from source systems in SAS, you typically use the SAS/ACCESS interface to connect to various data sources (like databases, flat files, etc.). You can write a DATA step or use PROC SQL to pull the data into SAS.

Once the data is extracted, it is stored in an Operational Data Store (ODS) using a SAS library. You can create a library pointing to a specific location where the ODS will reside.

For data modeling, you can use SAS tools like SAS Data Integration Studio or SAS Enterprise Guide to design your data model. This involves defining the structure of your data, including tables, relationships, and data types, and then creating ETL processes to transform and load the data into the ODS.
Ques:- What is the difference between Power Center & Power Mart?
Right Answer:
PowerCenter is a full-fledged ETL tool used for enterprise-level data integration, while PowerMart is a lighter version designed for smaller projects or departmental use. PowerCenter supports more complex workflows and larger data volumes, whereas PowerMart is more limited in features and scalability.
Ques:- Where do we use semi and non additive facts
Right Answer:
Semi-additive facts are used in scenarios where some measures can be summed across certain dimensions but not all, such as in time-based analysis (e.g., account balances). Non-additive facts are used for measures that cannot be summed at all, such as ratios or percentages (e.g., profit margin).
Ques:- What is the metadata extension?
Right Answer:
Metadata extension refers to additional information that describes the structure, operations, and context of data within a data warehouse or ETL process. It helps in understanding the data's origin, format, and relationships, facilitating better data management and usage.
Ques:- What are parameter files ? Where do we use them?
Right Answer:
Parameter files are configuration files used in ETL processes to define variables and settings that control the behavior of the ETL jobs. They allow for dynamic configuration, enabling the same ETL code to run in different environments or with different parameters without modifying the code itself.
Ques:- What is partitioning? What are the types of partitioning?
Right Answer:
Partitioning is the process of dividing a large database table into smaller, more manageable pieces, called partitions, while still treating it as a single table. The main types of partitioning are:

1. **Range Partitioning**: Divides data based on a specified range of values.
2. **List Partitioning**: Divides data based on a list of values.
3. **Hash Partitioning**: Distributes data across partitions based on a hash function.
4. **Composite Partitioning**: Combines two or more partitioning methods, such as range and hash.
Ques:- What are the various transformation available?
Right Answer:
The various transformations available in ETL include:

1. **Filter** - Removes unwanted data.
2. **Join** - Combines data from multiple sources.
3. **Aggregate** - Summarizes data (e.g., sum, average).
4. **Sort** - Orders data based on specified criteria.
5. **Lookup** - Retrieves related data from another table.
6. **Union** - Combines data from multiple datasets into one.
7. **Pivot** - Transforms rows into columns.
8. **Denormalization** - Combines tables to reduce complexity.
9. **Data Type Conversion** - Changes data types for compatibility.
10. **Expression** - Applies calculations or transformations to data fields.
Ques:- Do we need an ETL tool? When do we go for the tools in the market?
Right Answer:
Yes, we need an ETL tool when dealing with large volumes of data, complex transformations, or when integrating data from multiple sources. We should consider using tools in the market when we require scalability, efficiency, ease of use, and support for data governance and compliance.
Ques:- Compare ETL & Manual development?
Right Answer:
ETL (Extract, Transform, Load) is an automated process that efficiently moves and transforms data from various sources into a data warehouse, ensuring data quality and consistency. Manual development involves writing custom code to handle data processing, which can be time-consuming, prone to errors, and less efficient. ETL is generally preferred for large-scale data integration due to its automation and reliability, while manual development may be used for smaller, specific tasks where customization is needed.
Ques:- can u join flat file and database in datastage?how?
Right Answer:
Yes, you can join a flat file and a database in DataStage by using a Join stage. You would read the flat file using a Sequential File stage and the database using a Database stage, then connect both to the Join stage where you can specify the join conditions.
Ques:- How can i edit the XML target, are there anyways apart from the editing the XSD file.Can i directly edit the XML directly in Informatica designer.
Right Answer:
No, you cannot directly edit the XML in Informatica Designer. You need to modify the XML structure by editing the XSD file.
Ques:- What is a three tier data warehouse?
Right Answer:
A three-tier data warehouse architecture consists of three layers:

1. **Bottom Tier**: The data source layer where data is collected from various sources (like databases, flat files, etc.).
2. **Middle Tier**: The data integration layer where ETL (Extract, Transform, Load) processes are performed to clean, transform, and load data into the data warehouse.
3. **Top Tier**: The presentation layer where users access and analyze the data through reporting tools and dashboards.
Ques:- What are the various tools? – Name a few.
Right Answer:
Some common ETL tools include:

1. Apache NiFi
2. Talend
3. Informatica PowerCenter
4. Microsoft SQL Server Integration Services (SSIS)
5. Apache Airflow
6. AWS Glue
7. Pentaho Data Integration (Kettle)
8. IBM DataStage


ETL, an acronym for Extract, Transform, and Load, is a critical process in data warehousing and business intelligence. It represents a structured, three-phase approach to consolidating data from various disparate sources into a single, unified repository, such as a data warehouse or data lake. This process is the backbone of most data integration strategies, as it ensures that data is not only collected but also cleaned, standardized, and made reliable for reporting and analysis.

The three phases of the ETL process are distinct and sequential:

  1. Extract: This is the first phase, where raw data is retrieved from its original source systems. These sources can be incredibly varied, including relational databases, flat files (like CSV or text files), cloud applications, and web APIs. The extraction process is designed to be efficient and non-disruptive, ensuring that the source systems remain operational while data is being pulled.
  2. Transform: This is often the most crucial and complex phase of the process. Once the data is extracted, it undergoes a series of cleansing and manipulation operations. This includes applying business rules, filtering out irrelevant data, joining data from different sources, standardizing formats (e.g., converting dates or currencies), and validating data to ensure accuracy. The goal of the transformation phase is to prepare the data for its target destination and make it consistent and ready for analysis.
  3. Load: In the final phase, the transformed and cleansed data is written to the target system. This can be done through a full load, where all data is moved in a single operation, or more commonly, through an incremental load, where only new or changed data is loaded at regular intervals. This phase must be optimized for performance to ensure the target system remains accessible for users.

The importance of ETL lies in its ability to turn chaotic, siloed data into a valuable, organized asset. By providing a clean and centralized source of truth, ETL enables organizations to perform accurate business intelligence, generate insightful reports, and make informed, data-driven decisions that drive strategic growth and operational efficiency. The principles of ETL are so foundational that they have also influenced more modern data integration paradigms, such as ELT (Extract, Load, Transform).

AmbitionBox Logo

What makes Takluu valuable for interview preparation?

1 Lakh+
Companies
6 Lakh+
Interview Questions
50K+
Job Profiles
20K+
Users