Video engraving for virtual environments

Geb Thomasa, Ted Blackmonb, Michael Simsc, and Daryl Rassmussenc

aDepartment of Industrial Engineering, University of Iowa, Iowa City, IA

bTelerobotics & Neurology Units, U.C. Berkeley, Berkeley, CA

cIntelligent Mechanisms Group, NASA/Ames, Moffett Field, CA

ABSTRACT

Some applications require a user to consider both geometric and image information. Consider, for example, an interface that presents both a three-dimensional model of an object, built from a CAD model or laser-range data, and an image of the same object, gathered from a surveillance camera or a carefully calibrated photograph. The easiest way to provide these information sets to a user is in separate, side-by-side displays. A more effective alternative combines both types of information in a single, integrated display by projecting the image onto the model. A perspective transformation that assigns image coordinates to model vertices can visus to model vertices can visually engrave the image onto corresponding surfaces of the model. Combining the image and geometric information in this manner provides several advantages. It allows an operator to visually confirm the accuracy of the modeling geometry and also provides realistic textures for the geometric model. We will discuss several of our procedural methods to implement the integrated displays and discuss the benefits gained from applying these techniques to projects including robotic hazardous waste remediation, the virtual exploration of Mars, and remote mobile robot control.

2. INTRODUCTION

Various researchers associated with the IMG have pursued or are pursuing the idea of controlling a robot from a virtual environment1,2. The principle idea behind this line of research is that if the interface can provide a significant sense of presence for the robot operator — if it can provide a convincing illusions so the operator feels as if he or she is physically present at the robot’s site — then operation of the robot will proceed faster and with fewer errors.

One of the challenges in this approach is gathering, transmitting and displaying enough information about the remote scene to provide a convincing illusion for the display. One approach to providing a sense of presence is to provide a pan and tilt stereoscopic camera on the robot that trackstracks the operator’s head movement and delivers images of the remote scene to a helmet-mounted display worn by the operator. Unfortunately, image resolution and time delays make this approach less compelling than might be imagined. The approach described in this paper attempts to circumvent the need for a continuous stream of images and control loops with long time delays by combining detailed geometric models and images.

A second benefit of Video Engraving is revealed in an application at Sandia National Laboratories. The technique was applied to a system developed to direct a robot to remediate nuclear waste3,4. One of the challenges facing the designers of this interface is the fact that the sculpted surface geometry of the sludge and the camera images come from separate sources. The original interface approach was to take these two disparate signals and present them to the operator in separate windows on the display, or even separate displays. The operator visually compared the two signals to ensure the accuracy of the surface contour map and the robot operations.

If the image showed a metal pipe at a particular location, the operator naturally searched the sculpted surface for evidence of the pipe. This process requires the operator to build a mental picture of where the camera was, where the pipe must be in the tank, and where the pipe should appear in the sculpted surface. This task requires a sig significant mental effort and a significant exposure to human error.

Figure 1: An image of the mock nuclear waste is projected from the direction indicated by the arrow onto the geometrically modeled surface generated by Sandia’s laser-mapping system. The image provides visual information not provided by the geometry alone.

Figure 1 illustrates an approach that overcomes these difficulties by combining the two sources of information in a single scene. In a demonstration presented last summer at the 1996 Sandia Forum, the image from the camera was projected onto the three-dimensional model of the tank with a technique called video engraving. The effect is as if, in the virtual model, a slide projector with the image replaced the camera. The slide projector shined the image onto the sculpted surface, clearly displaying which region of the sludge was being viewed, and where the camera was when it took the picture. This unified display allowed the operator to avoid building complex internal models for the sake of comparing the image information to the laser-range finder information.

The Video Engraving technique has simultaneously been developed for a number of other applications including: the visualization of Mars, manufacturing, and remote mobile robot control. This paper presents our techniques, including mathematica mathematical representation and specific computer techniques, lessons learned in each of the applications, and our plans for future enhancements.

3. THE VIDEO ENGRAVING PROCESS

The Video Engraving process integrates video information with geometric models in virtual environments. It maps patches of the video images as textures onto the surface of polygonal model objects. The video textures enhance the model’s realism and bring information from the remote environment to the operator’s attention. A major advantage of this approach is that it allows the operator to view the photo-realistic model from angles other than the camera angle without sacrificing video information, so long as the model itself does not occlude the photographic surface textures.

Video Engraving associates each 3D vertex of the model with a 2D coordinate of the video image through a perspective transformation. The effect is as if, in the modeled world, a slide projector replaces the camera. The projector acts like a light source that casts its image regardless of subsequent viewing angle. The projector shines the video image onto the model and deposits the pixels directly onto the objects in the model as textures. One might think of the image patches being engraved onto the surface of the modeled objects so operators can view the modeled objects from many new angles and still see theee the integrated video information. Figure 2 illustrates the basic Video Engraving process.

Figure 2. Engraving Video On a Model. The algorithm cuts a camera image of a real block into pieces according to the model projection into the image plane. Then it fixes the corners of the image patches to the edges of the model. Operators can view the resulting model from any direction.

The elemental function of the Video Engraving process is to associate model vertices, each described by a vector [x, y, z, 1], with texture coordinates [u, v]. A number of references describe the transformation based on tracing rays through the camera’s focal point5. Kim6 provides algorithms to calibrate the transformations between a particular image and the three dimensional objects it views. His procedure gives:

where f is the focal length, w is an arbitrary scale factor, and the c’s are determined in the calibration procedure.

Depending on the geometric model and virtual viewing parameters, Video Engraving may introduce noticeable distortion effects into the rendered image. ered image. The success of the procedure depends in a complex way on the accuracy of the 3D model, the size of the facets of the model, the orientation of the facets with respect to the camera line of sight, the number of image pixels per facet, the number of facets in the model, and the position of the viewpoint when the operator views the video engraved model.

The algorithm used in most texture-mapping hardware linearly interpolates a texture image uniformly over each geometric facet. This efficient approach leads to distortion in the Video Engraving process if the facets of the model are oblique with respect to the line of sight of the video image, as Figure 3 illustrates.

Figure 3: A Projected Image Of An Evenly Stripped Bar (Left) Mapped To a Surface Without Correcting for Perspective Distortion (Right).

The left side of Figure 3 shows an evenly marked bar tilting away from a camera image plane. The diverging lines indicate rays passing from the focal point, through the virtual image plane, to the bar. The resulting 2D image of the bar shown in the middle of the figure is not an even array of markings, but enlarges the markings close to the focal point and reduces the markings that are farther away. Linearly engraving this image onto a single polygonal facetal facet creates a noticeable distortion, as illustrated on the right side of the figure. The Video Engraved bar does not have evenly spaced markings, as it should to match the bar in the real environment.

To date we have minimized such distortion effects by re-tessellating large, oblique polygons into a number of smaller polygons to adequately cover the projected 2D image plane. However, a tradeoff occurs between inserting too many polygons in the model which will slow the rendering rate and potentially limit real-time interaction. As the size of the polygonal facets decrease and begin to approach the 2D image resolution, image-based rendering techniques7 would be a more suitable means for visualization. However, experience through implementation has shown that far fewer polygons than the image resolution are needed to adequately eliminate human perception of the distortion effects due to oblique polygons.

4. APPLICATIONS

4.1 Hazardous waste remediation

To apply Video Engraving to Sandia’s hazardous waste remediation task, we modified their Rocinante software (htt://www.sandia.gov/cc_at/Forum96/sancho.html). Rocinate is a passive viewer of the operations in the main robot control interface, except that Rocinante operators can control their viewpoint independently. The Rocinante viewer provided the best opportunity to apply the apply the new technique because the main control software provided no opportunities to associate texture coordinates with individual objects, although this short coming is now being addressed.

For the demonstration, we decided to allow a single, predetermined image to be display as an engraved texture on the model. Because of challenges facing the calibration of all the interacting systems, we sought to make this engraving interactive so the camera location, pointing angle and projection assumptions could be adjusted with different keystrokes.

To gain this level of interactivity, we applied the opera-lighting technique available in Silicon Graphics OpenGL and Performer software packages. This procedure allows the programmer to assign a texture to a spotlight. Originally, this feature was probably intended to allow programmers to design their own intensity falloffs from the center of the spotlight so they create their own light patterns. Because the procedure was so general, one can define an arbitrary image for the spotlight, creating the effect of a slide projector shining on a scene. We defined the image to be the image gather from the camera, placed the spotlight in the correct position and wrote routines to adjust the viewing direction, the horizontal field of view and the aspect ratio.

The first opportunity to view the real scene and the final data set came the night before the demonstration. We simply loade loaded in an image of the scene, interactively adjusted the camera position until we empirically matched the projection assumptions. We could then interactively fly through the textured, three-dimensional image as if it were any three-dimensional scene.

4.2 Mobile robots

For remotely operated vehicles, a 3D model of the local terrain is often created from a stereo range map calculated from a set a stereo video images taken by video cameras on-board the robot. Through initial camera calibration and correlation of image features, a stereo disparity map is created and the stereo disparities are used to create an image range map. These range points are then used to generate a 3D polygonal mesh of the local terrain. The final 3D model includes the 3D terrain mesh with the original image. The 3D panoramic model can be incorporated into a virtual environment and viewed with a stereoscopic display.

Figure 4. A panoramic terrain model for the Marsokhod remotely operated vehicle. A pair of panoramic, stereo video images from on-board cameras were used to generate a sparse terrain mesh suitable for real-time rendering (left). A single image of the stereo pair was then engraved as a texture onto the resultant mesh to produce a photo-realistic 3D model (right) which could be interactivteractively viewed in a virtual environment with a stereoscopic display.

In practice, the 3D mesh triangulated from the image range map tends to include so many vertices that real-time interaction is limited by slow refresh rates. To overcome this limitation, the second author has explored vertex reduction and non-uniform mesh reconstruction using a Delaunay triangulation8. Another challenge occurs when the image is large. Due to hardware constraints, texture image files are typically limited to a size of 512 X 512 pixels. Consequently, the original image is split into smaller portions, which must overlap by a single pixel to avoid undesired seams in the rendered image.

Figure 4 shows a rendered portion of a panoramic terrain model generated during the ‘Desert 96’ field test of the Marsokhod remotely operated vehicle. The 3D rendered scene looks photo-realistic and the user can interactively pan and tilt the virtual viewpoint to gain a greater sense of telepresence. However, when the viewing position is translated significantly away from the original camera position, distortion affects begin to dominate the display. Consequently, methods of restricting the viewpoint to a region around the original imaging location is desirable.

Figure 5. Video Engraving process for 3D modeling of the Viking Lander spacecraft. The original triangle mesh composed from 5 separate laser scans (upper-left) was used as a skeleton model to reconstruct a suitable CAD model (upper-right) wi-right) with fewer facets and free of sensor noise and occlusion holes. Individual sub-components of the CAD model were then manually aligned with uncalibrated photographs (lower-left) to engrave the portions of the high-resolution camera images onto the final model used in the simulation (lower-right).

 

4.3 3D modeling of the Viking Lander

Video Engraving was also utilized in creating a 3D computer model of the Viking Lander spacecraft as part of an interactive, virtual reality exhibit to promote unmanned space exploration under development for the National Air & Space Museum. A prototype of the exhibit was on display at the museum on July 19th & 20th, 1996 to help commemorate the 20th anniversary of the Viking Lander 1 touchdown on the surface of Mars9. Without access to CAD files (the Viking spacecraft were fabricated over 20 years ago), a 3D laser scanner (courtesy of Cyra Technologies) was taken to the California Museum of Science and Industry to digitize a full-scale model of the Viking Lander.

Laser scans and photographs (uncalibrated with the laser scans) were taken from 5 different angles of the spacecraft. A consensus geometry technique10 combined the 5 separate laser scans into a single triangle mesh (Figure 5, upper-right). However, this data set included too many facets for real-time rendering, suffered from occfrom occlusion `holes’ in the model, and included distortions from averaging noise due to the consensus geometry algorithms. An interactive modeling toolkit was developed to manually reduce the data and reconstruct a suitable 3D model of the Viking Lander (Figure 5, upper-right).

Once the desired spacecraft sub-components were reconstructed, the operator adjusted the virtual viewpoint parameters to align individual sub-components of the Viking Lander model with each of the five corresponding photographs (Figure 5, lower-left). For each selected model sub-component (or set of sub-components), an algorithm considered the surface normal of the polygon and the viewing direction of each of the five photographs to select the "optimal" photograph for each polygon. The (u, v) texture coordinates for each vertex were then computed using the perspective transformation to Video Engrave the digital images onto the 3D model. (Figure 5, lower-right).

4.4 Modeling Mars from outer space

The virtual reality exhibit developed for the National Air & Space Museum also benefited from the Video Engraving process in generating a photo-realistic model of Mars utilized for an introductory approach to the Red Planet. For this part of the simulation, it was not necessary to incorporate the digital terrain elevation map for the surface of Mars because the virtual viewpoint was a global view of Mars from outer space where where surface elevations could not be perceived. However, it was desired to have a photo-realistic model which would maintain visual quality while `flying’ the viewpoint towards the planet.

A high-resolution 2D image of Mars showing the Valles Marineris canyon system was downloaded from the World Wide Web (http://nssdc.gsfc.nasa.gov/photo_gallery/PhotoGallery-Mars.html). A simple polygonal hemisphere was created to represent the surface of Mars for this image. Utilizing the known camera position for the 2D image of Mars, this virtual viewpoint was reproduced in the modeling software and the corresponding image was projected onto the portion of the hemisphere subtending its projection into the 2D image plane. With great delight, it was found that a user could interactively `fly’ the virtual viewpoint away from the original camera position and significantly towards the planet surface, while the photo-realistic quality of the resulting computer image was sufficient to give the impression of a more sophisticated modeling process. Figure 6 illustrates the Video Engraving process utilized for the modeling of Mars from outer space.

 

Figure 6. Video Engraving of a 3D model of Mars from outer space. A high-resolution 2D image of Mars was downloaded from the World Wide Web and engraved onraved onto a hemispherical mesh to create a photo-realistic 3D model (left). A user can interactively 'fly' the virtual viewpoint towards the red Planet and the 3D model retains its photo-realistic quality (right).

5. ADVANTAGES OF VIDEO ENGRAVING OVER PRESENT TELEROBOTIC DISPLAYS

Most telerobot displays provide information from the remote environment in one of 3 ways: live video, separate video and graphical model displays, or a simulation model blended with the video view. The blended interfaces generally take the form of the phantom display approach pioneered at MIT11 and further developed at the Jet Propulsion Laboratory.12,13 A phantom display projects the model into the image plane and superimposes it on the live video view. This technique allows the operator to compare the model information with the live information in one scene, but sacrifices the three-dimensional aspect of the model in the projection transformation defined by the physical camera location. The other approaches also have advantages and limitations, which Table 1 outlines.

Table .1: Comparison of Different Telerobotic Display Approaches

Technique

Advantages

Disadvantages

Video-only

Clear display of video information.

Viewpoint restricted to physical camera location;

No model information or interaction.

Video-only

Clear display of video information

Limited viewpoint

No model information

Separate Model and Video Displays

Allows model interaction to enable supervisory control;

Unlimited viewpoint with model.

Difficult to mentally correlate the model with the video information.

Model overlay on video.

Model overlay on video.

Combines model and video; Allows model interaction to enable supervisory control;

Viewpoint restricted to physical camera location.

Video Engraving

Seamless, natural integration of video and model.

Computationally expensive, requiring texture mapping hardware; Potentially introduces distortion of video.

Video-only interfaces limit the operator’s viewpoint to the physical camera location. Mechanical mounts can, in turn, limit the camera’s viewpoint. Also, video-only displays do not include model information, which aids the operator’s construction of a mental map and is the basis for the task simulation and collision avoidance algorithms.

Separate model and video displays provide 3D model information and enable graphical simulations for supervisory control. However, it can be mentally difficult to correlate the 3D model with the 2D video and does not provide an easy visual mechanism to confirm the accuracy of the model. Moreover, the human operator is required to share limited attention resources with mmited attention resources with multiple displays.

Phantom robot interfaces, which superimpose the model information onto the video image, offer the advantage of combining both the video and model information in a single display. Unfortunately, the phantom paradigm collapses to separate model and video displays if the operator wishes to view the model from a position unattainable by the physical camera.

The Video Engraving approach provides several advantages over other techniques:

  1. It allows a reduction in model complexity without sacrificing the operator’s ability to perceive detail.
  2. Discrepancies between modeled geometry and the image information are evident in the display: both in location and magnitude.
  3. The three-dimensional environment provides distinction between depth and scale change.
  4. The three-dimensional environment also provides a greater range of viewpoints for interactive exploration.

6. FUTURE DIRECTIONS

This work is at an early stage of development and faces at least three challenges that offer interesting directions for future research. The first two relate to distortion effects that can dominate the scene unless controlled. The third relates to hardware improvements that will significantly improve the power of this technology.

  1. hnology.

    1. Distortion analysis

    As described previously, distortion effects occur in the Video Engraving process when polygonal facets of the 3D model are oblique to the line of sight of the video camera (see Figure 3). Currently, this problem is handled by subdividing polygons where the distortion noticeably occurs so that more image points are explicitly assigned to model coordinates. We are currently investigating mathematical techniques to quantify these distortion effects and overcome this limitation by appropriately warping the 2D video image prior to application of the texture.

    Distortion effects are also present when the 3D model is not sufficiently accurate. Presently, this is used as visual feedback for the human operator to subjectively verify the accuracy of the 3D model. However, the authors hypothesize that comparing the resultant rendered image after Video Engraving with the original 2D video image is an ideal mechanism to automatically verify the accuracy of the model through. Moreover, it may be possible to feedback the ‘distortion error’ through algorithmic analysis and quantification in order to continually drive an on-line modeling process for robotic vision.

    6.2 Re-sampling requirements

    A second issue associated with the video engraving technology, and texture maps in virtual environments generally, is defining the limits of useful viewing positions for a texture a texture-mapped scene. The simplest instance of the phenomenon is that when the object is approached, individual pixels begin to dominate the display, destroying the photo-realistic effect. However, for the Video Engraving process, other distortions can also spoil the photo-realistic illusion when the virtual viewpoint is moved to areas where the 2D video information is inadequate or does not exist. In our continuing research we will seek to characterize this process, and quantify its effects in order to define the most effective range of viewpoints for any video-engraved model and determine when an update of the video texture is necessary.

    1. Get video into texture memory

    It is desirable to manipulate the virtual environment so that new images are projected onto 3D models as fast as possible and without re-compilation of the computer program. For live video streams, this could possibly require updating the video image in texture memory as often at 30 times per second (or even faster for high speed video cameras). When the camera is not moving relative to the modeled environment, it is not necessary to update the texture image. The rate of update depends upon the speed of the physical video camera through the real environment. In order to automate the updating of the texture image and to maximize the speed of this process, the authors advocate and recommend research efforts towards connecrds connecting live video streams from frame-grabber hardware directly into texture memory hardware.

    7. CONCLUSIONS

    The Video Engraving approach provides a practical strategy for integrating three-dimensional and video information in a single interface environment. The combination of the two information streams provides many synergistic display opportunities that are not available in any other telerobotic interfaces. The advantages include:

    • continuous visual monitoring of model accuracy;
    • richer visual displays allowing distinction between viewpoint movement and scale change;
    • less complex geometric models; and
    • a design that is easily extendible with virtual reality technology.

    The insight into the underlying problems facing modern telerobotics and the creative solutions provided by the Video Engraving process represent a significant step forward for the telerobotics community.

    ACKNOWLEDGEMENTS

    Portions of this work was performed while the first author held a National Research Council-ARC Research Associateship. Application of the Video Engraving process to hazardous waste remediation was supported in part by a grant from Sandia National Laboratories. Acknowledgment is given to Phil Hontalis, Eric Zbinden and Lew HitEric Zbinden and Lew Hitchner for software development of the stereo-video pipeline for the Marsokhod telerobot Desert 96 field test. Development of the virtual reality exhibit for the National Air & Space Museum was funded by an educational outreach grant from Steve Brody at NASA Head Quarters. Thanks is given to Bryan Vandrovec at Cornell University for assistance with software development in modeling the Viking Lander spacecraft.

    REFERENCES

    1. Stoker, C. and Hine, B.P., "Telepresence control of mobile robots: Kilauea Marsokhod Experiement," AIAA Reno, NV, Jan, 1996.
    2. Fong, T., Pangels, H., et al. "Operator Interfaces and Network-Based Participation for Dante II," SAE 25th Int. Conf. on Environmental Systems, San Diego, CA, July 1995.
    3. McDonald, M.J., Small, D.E., Graves, C.C., and Cannon, D, "Virtual Collaborative Control to Improve Intelligent Robotic System Efficiency and Quality." To be presented at IEEE ICRA-97, Albuquerque, NM.
    4. Cannon, D, and G. Thomas, "Virtual tools for supervisory and collaborative control of robots," Presence (In Press).
    5. Fu, K.S., Gonzalez, R.C., and Lee, C.S.G. (1987), Robotics: Control, Sensing, Vision, and Intelligence. McGraw-Hill, New York.
    6. Kim, W. S., "Virtual Reality Calibration and Preview/Predictive Displays for Telerobotics," Presence: Teleoperators and Virtual Environments, 5(2): 173-190, 1996.
    7. Levoy, M. and Hanrahan, P. "Light field rendering," Computer Graphics Proceedings. Annual Conference Series. SIGGRAPH 96 New Orleans, LA, pp. 31-42.
    8. Shewchuk, J. R., "Triahuk, J. R., "Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator," ACM First Workshop on Applied Computational Geometry, Philadelphia, PA, pp. 124-133, 1996.
    9. Soffen, G. A. and Snyder, C. W., "The first Viking mission to Mars, Science, 193(4255): 759-66, 1976.
    10. Turk, G. and Levoy, M., "Zippered polygon meshes from range images," Computer Graphics Proceedings. Annual Conference Series, SIGGRAPH 94 Conference Proceedings, Orlando, FL, pp. 311-18.
    11. Kim, W.S. and Bejczy, A.K., "Demonstration of a High-Fidelity Predictive/Preview Display Technique for Telerobotic Servicing in Space, IEEE Trans. on Robotics and Automation, October 698-702, 1993.
    12. Noyes, M.V. and Sheridan, T.B., "A Novel Predictor for Telemanipulation Through a Time Delay," Proceedings of the 20th Annual Conference on Manual Control, NASA Ames Research Center, Moffet Field, CA, 1984.
    13. Bejczy, A.K., Kim, W.S. and Venema, S.C., "The Phantom Robot: Predictive Displays for Teleoperation with Time Delay," Proceedings of the IEEE International Conference on Robotics and Automation, Cincinnati, May, 13-18, 1990.