Due date: Tuesday, 04/10/2001
Yu-Sung Chang
Here are two rectified images. (Press the image to enlarge.)
Since the intrinsic/extrinsic parameters from INRIA site are a little bit different from the book's notion, we have to change our rectification routine.
For any points (x',y',z') in the camera coordinate, the corresponding world coordinates are
(x,y,z)T = R (x',y',z')T + T.
TL and TR are translates for left and right, respectively, then the coordnates of the camera center OL, OR have in fact the same coordinates as TL, TR. So, if our T = TR - TL,
TL' = inv(RL) * TL, TR' = inv(RR) * TR, where RL, RR are rotations for left and right respectively.
Using these TL' and TR', we can get the rectifying matrix RrectL and RrectR, by using the method explained in the book.
In fact, we have to do sampling using the inverse of them. "bilinear.m" is bilinear sampling function called in "rectify.m"
MATLAB function "rectify.m" is included which performs this process. The inputs are;
function [OutL,OutR] = rectify(InL,InR,Tl,Tr,Rl,Rr,lu0,lv0,lau,lav,lf,ru0,rv0,rau,rav,rf,offx,offy) % InL, InR: Left and right images (should be the same dimension, rgb color) % Tl, Tr, Rl, Rr: Translation and rotation matrices, left and right, repectively % lu0, lv0, lau, lav, lf: Principal point coord, scale factor(u and v) and focal length of left % ru0, rv0, rau, rav, lf: Principal point coord, scale factor(u and v) and focal length of right % offx, offy: x and y offsets
OutL and OutR are rectified images. This link contains
pre-prepared MATLAB data sets. (data1.zip) (data2.zip) (use "load
-MAT" command)
hw3_1.m, hw3_2.m are simple call routines for each images, using the above data
sets.
Since the camera won't show the scene some cases after rectification (because of the rotations), we have to adjust offsets. (and sometimes focal length also.)
Also, included C++ program, "3drecon.exe" performs exactly the same. The usage is;
3drecon [desc.txt] [x offset] [y offset]
"desc.txt" is a description file for the stereo scenes
(two images) and all the parameters. "color.txt" and "batinria.txt"
are included. For "color*.tif" images, x=550, y=-40 offsets are ok.
For "batinria*.tif", no offset required. (since original camera
positions are fairly parallel.)
It will also do a lot of functions, explained later.
Example) The above images are produced by the following parameters.
c:> 3drecon color.txt 550 -40
c:> 3drecon batinria.txt 0 0
The pictures are disparity maps with various parameters. (Press the image to enlarge.)
color0.tif and color1.tif, using W=4 (full sampling) and W=10 (with sampling every 4 points) resp.
batinria0.tif and batinria1.tif, using W=4, full sampling.
The brighter mans the closer. Black area means either no disparity (infinity) or no match (out of thresholds).
All programs are written C++, whose codes and executables are in this ZIP file. The disparity factor is simple subtraction between two matched x values of rectified images, so there are missing part on the map. Of course, we have to perform thresholding to avoid extremes. (But, still contains some error.)
The effect of the window size will be explained in 2. c).
In fact, the program "3drecon" has more functions.
By pressing some points in "Left Rectified Image" with left mouse button, it will show the point, the matching point in "Right Rectified Image", All these points in the original images, and also 3D coordinate of the point in World Coordinate System, using traingulation. Right mouse button will clean up the point queues. (Selected 3D reconstruction)
You can same the original pair, the rectified pair, disparity map, and the 3D positions of the points you selected.
You can change window size of SSD, scan thresholds. Since the epipolar line could have some errors during rectification and discretization, you'd better search two or three lines above and below also. Scan thresholds control these value.
Sampling rate means that the program will check every [sampling rate] point only for speed's sake.
Here are some examples of 3D data set, using disparity as depth.


color_disp_3dpos.dat and color_disp_3dpos_w10.dat
(using pointViewer, W=4 and W=10, resp.).
Due to the position of cameras, disparity map does not necessarily mean the
actual geometry.
Still we can see some features.

batinria_disp_3dpos.dat (using pointViewer).
Even though some distance are not real, you can see the figure of the house.
Remember that since disparity is NOT actually depth, the result could be wrong. Especially, for the images "color*.tif", original camera position is not so aligned to each other, simple disparity map does not produce good 3D explanation. The cameras on "batinria*.tif" are already parallel enough, so a disparity map gives us fair information about the actual depth. The C++ program "disp2threed.exe" converts a disparity map (gray scale) into 3D points data, simply using image intensity as its height. The usage is;
disp2threed [disparity TIFF] [Output 3D data] [X sampling] [Y sampling]
Sampling parameters are used to skip points. The output format is for the pointViewer.exe, made by Prof. Oliveira.
3D triangulation is done by the follow;
Suppose we know Pl (in the left plane)and its corresponding Pr (in the right plane) in World Coordinate System. Both camera centers are TL and TR in WCS, as explained in 1.
The formulas
a(Pl - TL) + TL ....... [L1]
b(Pr - TR) + TR ....... [L2]
are two lines go through the center and the points. (a, b in real)
Ideally, if these two lines intersect, the coordinate of the point is our 3D coordinate of Pl in WCS. But, in real cases, it will not happen.
So, we will compute the mid-points of the shortest line segment between L1 and L2. The shortest line segment of L1 and L2 is in fact perpendicular to both (Pl - TL) and (Pr - TR), so we have;
a*(Pl - TL) - b*(Pr - TR) = c * cross(Pl - TL, Pr - TR)
If we solve this system, we get a, b, and c. (3 coordinates means 3 equations). Then our P is;
P = (1/2) * ((a*(Pl - TL) + TL) - (b*(Pr - TR) + TR));
3drecon will perform this process automatically whenever you press left button on left rectified image. It will show actual 3D coordinate of the given point.
Here is an example of real 3D reconstruction data for pointViewer.exe, and selected points on the image.

color_sel_3dpos.dat (using pointViewer). You can clearly see the reconstructed stapler, tape, and bottles.
The points used to do 3D reconstruction of the above scene. (color_sel_pos.jpg)
Since there are a lot of extremes due to either numerical and matching errors, it is unreasonable to do find all 3D points of entire image. pointViewer.exe is unable to show the picture in lot of cases, because it tries to do centering whenever it gets data. Also, there are too many points if you do full scene reconstruction. Therefore, we'd better stick to some feature points, along with clipping and rescaling.
2. a) The below figures are some examples using various W's. (Press the images to enlarge.)
W=4
W=10
W=20
In fact, the result is better if W's is larger, but the time consumption is getting worse, just like a convolution. Large W is helpful if you want to find match in featureless area, like the folder in the image. As you can see, both W=4 and W=10 fail to find correct match for the end of the bottle (they match the point with its shadow), but W=20 successfully matched it. Also, almost all point is in its place, except some points in large blue area where virtually no distinguish feature exists.
Not always big W is our choice. Here's an extreme case;
batinria0.tif and batinria1.tif, using W=10, scan threshold -3,+3
As we can observe, if the area does not contains any feature (sky, road, in the middle of roof), the SSD is easy to make a mistake even if we use a larger W. Also, it is confused by the windows, since they look exatly the same. Usally, no pattern or periodical pattern causes a lot of error.
The program 3drecon is fully equipped to do this performance. (Real 3D reconstruction using triangulation.) The problem is that the data set at CMU site is not compatible with INRIA's, so I have to rewrite the program. (at least, image-to-world transformation part.) Also, extrinsic rotation data is bit unclear to me, even though we can reuse the codes at the site.
Since the image sets are fairly parallel, we can skip rectification part, and could use SSD over any pair.
We can expect some errors, due to 1) discretization of the image coordinate, 2) numerical precision (floor, ceiling, variable conversion problem, etc.), 3) Mismatching during SSD, 4) other errors.
Since I haven't finished it actually, I only could assume that the further two cameras are, the larger disparities are, which could mean we might get rid of some precision errors. But, not necessarily.