           Perspective Texture Mapping Using Area Subdivision Method


CONTENTS
--------

1. Introduction
2. Features
3. How does it work
4. Program options
5. How fast is it
6. Future work
7. The author


1. Introduction
---------------

There are quite a number of perspective correct texture mapping approximation 
methods. They include affine mapping, scanline subdivision, area subdivision, 
parabolic mapping, constant-z mapping. For more information on these 
methods and texture mapping in general see Win95GPE article at 
http://www.geocities.com/SiliconValley/2151/tmap.html.
The superior method is considered scanline subdivision. My goal is to show 
that area subdivision can beat scanline subdivision. I have no performance 
measurements of scanline subdivision method, so I can't compare, but I 
consider my texture mapping implementation quite fast. The biggest advantage 
of the subdivision method is that it does not rely on the FPU and integer 
parallelism, so you can use MMX instructions for inner loops. Unfortunately 
this is left for future work because I have no access to a MMX computer. 

I have written a small demo program showing a rotating textured cube which 
can be viewed from inside or outside. Inside view is quite useful for 
performance measurements because textures cover all screen. For discussion 
about how it works see section 3. For performance measurements see section 5.
The newest version of the demo and this document can be found on my 
homepage.


2.  Features
------------
1.  High speed high quality perspective texture mapping
2.  No FPU instructions in inner loops, ready to use MMX
3.  Automatic quality control
4.  Filling in tiles reduces cache misses
5.  Subpixel, subtexel accuracy
6.  True color and hicolor support

3.  How does it work
--------------------
I will talk about the main ideas of the algorithm only. You are considered 
already to know something about texture mapping.

I am given a convex polygon on the screen and the texture information. I use
the magic number(vector) approach, so texture coordinates can be expressed as
a function of screen coordinates: u=u(x,y), v=v(x,y). Calculating this 
function for each pixel is slow because it involves divides. Area subdivision
method tells us to subdivide the polygon into smaller pieces, calculate 
texture values in the corners and lineary interpolate inside. But what 
subdivision method to choose? The first answer that comes into the mind is 
triangles. But although they have some nice properties it's quite slow to 
render them. Much faster is to render squares. But we can't subdivide 
arbitrary polygon into squares nicely. 

My approach is to subdivide the polygon into aligned squares and monotone 
trapezoids. The length of the side of the square is a power of two, which
helps in fast texture delta calculations. Trapezoids are ensured to be 
smaller than a given threshold. Here is the biggest difference between 
scanline and area subdivision approaches. Let the subdivision size be 16
pixels. In scanline subdivision approach I must calculate the correct texture
coordinates every 16 pixels, but in area subdivision approach this number is
16x16=256 pixels. So I can draw 256 pixels with only one divide! And the
quality stays the same. So how the subdivision is done? First I divide the
polygon into monotone trapezoids. For convex poly this is simple. For general
simple polygon this also can be done although a bit harder but also in linear
time. 
Let us assume that subdivision size is 16 pixels. I divide each trapezoid
into a number of smaller ones whose height does not exceed 16. Then I divide
the inside of each this trapezoid in aligned squares. It forms two(or one in
degenerate case) small trapezoids on each side. Then I draw these squares and
trapezoids. 
But there is one small problem. If the slope of the poly edge is small, the 
resulting trapezoid can be longer than 16 pixels, we should somehow subdivide
it. That is not so easy as it seems, so I choose a quick and dirty approach:
I divide the threshold by two and pass this trapezoid again to the subdivider.
This approach works quite nice at practise because the number of such 
trapezoids is small. 

Here is the pseudocode for all this:

void drawpoly(polygon P){
	subdivide  P in monotone trapezoids T1..Tn
	for each trapezoid T subdivide_Y(T)
}
void subdivide_Y(trapezoid T){
	subdivide T in trapezoids R1..Rn whose height<=threshold
	for each trapezoid R subdivide_X(R)
}
void subdivide_X(trapezoid T){
	calculate aligned squares S1..Sn which fit in T
	for each square S draw_square(S)
	if(there are no squares) trapezoid(T)
	else{
		Tl = trapezoid which forms in the left side of the squares 
		     from T
		Tr = trapezoid which forms in the right side of the squares
		     from T
		trapezoid(Tl);
		trapezoid(Tr);
	}

}
void trapezoid(trapezoid T){
	if(size(T)<=threshold) draw_trapezoid(T)
	else{
		threshold = threshold/2
		subdivide_Y(T)
	}
}

Something also should be said about automatic quality control. It simply 
chooses the largest threshold for the polygon so that the approximation error
is small. As I figured out the error is proportional to the second derivative
of the mapping function. I simply calculate the second derivative at each
corner of the polygon and find the maximum one. Then I compare this number
with experimentally chosen constants to set the appropriate threshold value.


4. Program options
------------------
There are several command line arguments:

   -in (default)
       Displays the cube from inside.
   -out
       Displays the cube from outside.
       Do not use this parameter with overwrite mode.

   -doublebuffer
       Use doublebuffering.
   -pageflip (default for VESA modes)
       Use page flipping. Do not use when the required mode has less than 3 
       pages.
   -overwrite (default for VGA mode 13h)
       Simply overwrite the old picture.

   -v
       Wait for vsync. Do not use this parameter with overwrite mode.

   -S where S={2,3,4,5}
       Sets the subdivision threshold to 2^S.
       By default the program uses automatic subdivision control.

   -modeXxY 
       Sets the VESA mode with horizontal resolution X and vertical resolution
       Y. VBE 2.0 interface must be installed. If you don't have one, get the
       UniVBE from SciTech http://www.scitechsoft.com/
       If specified display mode could not be found all available resolutions
       are printed out.
   -truecolor
       Sets 32-bit color depth. By default 320x200 VESA mode is chosen.
   -hicolor
       Sets 16-bit color depth. By default 320x200 VESA mode is chosen.
       
Examples:

   textur -mode640x480 -out
         Sets the VESA mode 640x480 and the cube is displayed from outside.
   textur -doublebuffer -3
         Uses doublebuffer and sets the subdivision size to 8 pixels.
   textur -mode320x240 -v
         Sets the VESA mode 320x240 and waits for vsync before displaying each
         frame. Page flipping is enabled by default. 
   textur -mode1x1
         Prints out all available VESA modes.
   textur -truecolor
         Displays the cube in true color.


5. How fast is it
-----------------
Here are some performance measurements. Tests were made on Pentium 120 
with S3 Trio64V+ video card. Textures cover full screen. Numbers are given in 
frames per second. 
N = not tested.

            program parameters
resolution | <none>| -doublebuffer | -3  | -4  | -5  | -truecolor | -hicolor |
-----------|-------|---------------|-----|-----|-----|------------|----------|
320x200    | 227   | 181           | 125 | 196 | 245 | 142        | 166
400x300    | 137   | 109           | 68  | 108 | 138 | N          | 98
640x480    | 61    | 47            | 29  | 47  | 61  | 39         | 43
800x600    | 42    | 29            | 19  | 32  | 42  | 19         | 28
1024x768   | 29    | N             | N   | N   | 29  | N          | N
1280x1024  | 18    | N             | N   | N   | N   | N          | N
1600x1200  | 12    | N             | N   | N   | N   | N          | N


6. Future work
--------------
1.  MMX support
	That really should be done. Could give quite a large speedup.
2.  Optimisations
	I doubt that anyone could get it faster more than 20% for large
        polygons. Inner loop takes about 60% of the time for large polygons.
        For small polygons major optimisations are still possible.
3.  Lighting
	With  texture caching approach nothing should be done. But I doubt
        that it is the best way for doing lighting although it gives some
	flexibility. It consumes much memory that could not be cached and
	memory transfers can become a bottleneck. Gouraud shading is easy,
	but not satisfactory for most situations. If I could express the
	shading as a function of screen coordinates the same way as texture
	functions then shading would be an easy job.
4.  Antialiasing
	Not very hard for true color. For 256-color modes requires some quite 
	large lookup tables.
5.  Filtering
	Something to think about. Bilinear, bicubic?  How fast it will be?
 	Real-time?


7.  The author
--------------

Karlis Freivalds

University of Latvia
Institute of Mathematics and Computer Science
Raina blvd. 29
LV-1459, Riga
Latvia

E-mail:  karlisf@cclu.lv
WWW: http://www.geocities.com/SiliconValley/Horizon/2920/

Any comments, questions, suggestions are welcomed.
