Find Interview Questions for Top Companies
Ques:- What is 3D vision and how is it different from 2D computer vision?
Right Answer:
3D vision refers to the ability of a system to perceive depth and spatial relationships in a three-dimensional space, allowing it to understand the shape, size, and position of objects. In contrast, 2D computer vision processes images as flat, two-dimensional representations, lacking depth information.
Ques:- Explain the concept of depth perception in 3D vision. How is it achieved computationally?
Right Answer:
Depth perception in 3D vision refers to the ability to perceive the distance of objects in a three-dimensional space. It is achieved computationally through various techniques, including:

1. **Binocular Disparity**: Using two slightly different images from each eye to calculate depth based on the difference in their positions.
2. **Monocular Cues**: Utilizing single-eye cues like size, texture gradient, overlap, and perspective to infer depth.
3. **Motion Parallax**: Observing how objects move relative to each other as the observer moves, providing depth information based on their relative motion.
4. **Depth Sensors**: Using devices like LiDAR or stereo cameras to measure distances directly.

These methods help create a perception of depth in 3D environments.
Ques:- What are stereo vision and structure from motion (SfM)? How do they differ?
Right Answer:
Stereo vision is a technique that uses two or more cameras to capture images from different viewpoints to perceive depth and create a 3D representation of a scene. Structure from Motion (SfM) is a process that reconstructs 3D structures from a series of 2D images taken from different angles, estimating camera positions and scene geometry simultaneously. The main difference is that stereo vision relies on simultaneous images from multiple cameras, while SfM uses sequential images from a single camera or multiple cameras over time.
Ques:- Describe the role of intrinsic and extrinsic camera parameters in 3D vision.
Right Answer:
Intrinsic camera parameters define the internal characteristics of the camera, such as focal length and optical center, which affect how 3D points are projected onto the 2D image plane. Extrinsic camera parameters describe the camera's position and orientation in the 3D world, determining how the camera is situated relative to the scene being captured. Together, they are essential for accurately mapping 3D coordinates to 2D images in computer vision.
Ques:- What is a point cloud? How is it generated and used in 3D vision applications?
Right Answer:
A point cloud is a collection of data points in a three-dimensional coordinate system, representing the external surface of an object or environment. It is generated using 3D scanning technologies, such as LiDAR, photogrammetry, or depth sensors. In 3D vision applications, point clouds are used for object recognition, scene reconstruction, and analysis in fields like robotics, computer vision, and virtual reality.
Ques:- How do stereo cameras work to extract depth information from images?
Right Answer:
Stereo cameras work by capturing two images simultaneously from slightly different angles, similar to how human eyes perceive depth. By comparing the two images, the system calculates the disparity between corresponding points, allowing it to determine the distance of objects in the scene and create a 3D representation.
Ques:- What are common algorithms used for stereo matching?
Right Answer:
Common algorithms used for stereo matching include:

1. Block Matching
2. Semi-Global Matching (SGM)
3. Dynamic Programming
4. Graph Cuts
5. Belief Propagation
6. Deep Learning-based methods (e.g., Convolutional Neural Networks)
Ques:- Explain disparity and how it’s related to depth.
Right Answer:
Disparity refers to the difference in the position of an object in the left and right images captured by our two eyes. It is related to depth because greater disparity indicates that an object is closer to the observer, while smaller disparity suggests that the object is farther away.
Ques:- How does occlusion affect stereo matching, and how can it be handled?
Right Answer:
Occlusion affects stereo matching by causing parts of the scene to be hidden from one of the cameras, leading to incorrect depth estimation. It can be handled using techniques like occlusion detection, where algorithms identify occluded regions and either ignore them or use inpainting methods to estimate depth based on visible areas.
Ques:- What is epipolar geometry? Why is it important in stereo vision?
Right Answer:
Epipolar geometry is the geometric relationship between two views of the same scene captured by two cameras. It defines the epipolar plane, epipoles, and epipolar lines, which help in constraining the search for corresponding points in stereo images. It is important in stereo vision because it reduces the 2D correspondence problem to a 1D search along epipolar lines, making it easier and more efficient to find matching points between the two images.
Ques:- Describe the process of 3D reconstruction from multiple images.
Right Answer:
3D reconstruction from multiple images involves the following steps:

1. **Image Acquisition**: Capture multiple images of the same scene from different angles.
2. **Feature Detection**: Identify key features or points in each image using algorithms like SIFT or ORB.
3. **Feature Matching**: Match these features across the different images to find correspondences.
4. **Camera Calibration**: Determine the camera parameters (intrinsic and extrinsic) for each image to understand the perspective.
5. **Triangulation**: Use the matched features and camera parameters to calculate the 3D coordinates of the points in space.
6. **Point Cloud Generation**: Create a point cloud representing the 3D structure of the scene.
7. **Surface Reconstruction**: Convert the point cloud into a mesh or surface model using techniques like Delaunay triangulation or Poisson reconstruction.
8. **Texture Mapping**: Apply textures from the original images onto the 3D model to enhance realism
Ques:- What is bundle adjustment? When is it used in 3D reconstruction?
Right Answer:
Bundle adjustment is an optimization technique used in 3D reconstruction to refine the 3D coordinates of points and the camera parameters simultaneously. It minimizes the re-projection error between observed image points and projected 3D points, ensuring a more accurate and consistent 3D model. It is typically used after initial structure-from-motion processes to improve the quality of the reconstruction.
Ques:- How does SLAM (Simultaneous Localization and Mapping) work in 3D vision systems?
Right Answer:
SLAM (Simultaneous Localization and Mapping) in 3D vision systems works by using sensors to gather data about the environment while simultaneously tracking the position of the sensor itself. It creates a map of the surroundings by identifying features in the 3D space, such as points, lines, or surfaces, and updates the sensor's location based on the movement and changes in the environment. This process involves algorithms that fuse data from various sources, like cameras and LiDAR, to ensure accurate mapping and localization in real-time.
Ques:- Compare LiDAR-based 3D mapping with vision-based 3D mapping.
Right Answer:
LiDAR-based 3D mapping uses laser pulses to measure distances and create precise 3D models, providing high accuracy and detail, especially in complex environments. Vision-based 3D mapping relies on cameras and computer vision techniques to interpret images, which can be less accurate in low light or featureless areas but is often more cost-effective and easier to deploy.


AmbitionBox Logo

What makes Takluu valuable for interview preparation?

1 Lakh+
Companies
6 Lakh+
Interview Questions
50K+
Job Profiles
20K+
Users