The University of Southern Mississippi Seal  
Proposal for Voice Communications in a VR Environment
The University of Southern Mississippi
 

 

 

Natural Language Interaction with a
Construction Estimating
Virtual Reality Environment

By

Blake Howe

Submitted to:

Dr. Sulbaran
Table of Contents

0- Abstract

1- Problem Statement

2- Project Objective

3- Methodology
3.1 Approach
3.1.1 Architecture
3.1.2 Documentation
3.1.3 Testing Plan
3.1.4 Storage of data
3.1.5 Templates
3.1.6 ATN parser
3.1.7 Decision trees
3.1.8 Decision making process
3.1.9 Operating System
3.1.10 Speech
3.1.11 Customizing for the web
3.1.12 Language
3.1.13 Object-oriented
3.2 Milestones

4- Cost and Materials

5- Anticipated Results

6-Conclusion

7-References

0 - Abstract

The users of Virtual Reality (VR) environments interact with them through pointing and clicking methods. However, there is a lack of capability in the VR environments to allow the users to interact using a natural language (conversation). The objective of this project is to create a piece of middleware that will allow users to communicate with VR environments through natural language (conversation). The software developed will have a degree of intelligence and making queries as close to natural language as possible. This will be achieved by applying decision tree logic and a series of algorithms for natural language processing to an avatar within a VR environment. This software will be linked with a VR environment focusing on Construction Estimating. Thus, It is anticipated that the construction management students will be able to communicate with the VR Environment using a natural language (conversation) to gain a better understanding of quantity estimating. In turns, the students and faculties will have a new tool that could enhance the educational experience.


1 - Problem Statement

Virtual Reality Environments provide a high level of flexibility. However, most of the users' interaction is done through mouse and keyboard. The problem lies in the fact that there is a need for an interface that will allow the user to communicate in a natural manner (conversation). According to Warren C. Couvillion Jr. a senior research engineer in the Advanced Interactive Technologies Department of the Training Systems and Simulators Division, as VR's become more immersed realistic interfaces will become increasingly important [Couvillion 2001]. The author of this proposal is certain that speech is the most realistic interface that it could be. Because, conversation is an integral part of human daily life and realistic interface. W. Couvillion indicates that projects that increase realistic interfaces are worthwhile .

There are similar projects underway that are trying to solve educational issues by applying intelligence and VR. The CRAIG project and the ISIS-Tutor will be discussed further to provide the background for this work.

The CRAIG project is a knowledge-based system that doles out degree requirement information at Youngstown State University that was undertook by a team of five Information systems undergraduates. This system makes an attempt at intuitive interface but falls short at not including any type of environment that will immerse the user or allow them to utilize the most natural form of communication. What was accomplished was some work on natural language processing that was based on template and frame grammar. (Hughes 2002). The virtual tutor will distinguish itself by residing with a VR environment and allowing the users to make their requesting using speech.

Secondly the ISIS-Tutor ( Brusilovsky 1989) is an intelligent tutoring system that originated from the State University of Moscow. This system was so successful that it has become a generic term for intelligent tutoring systems. This tutor is broke into several different modules. These modules are the domain and student modules, intelligence model, and hypermedia model. It incorporates several good ideas such as modularity and a form of intelligence. The intelligence scheme is similar to the Virtual Estimator in the fact that it allows students to explore existing areas of knowledge as well as add train the system. The problem with this application is that the interface is menu driven and the "hyper-media" seems to be more of a chat room with navigation than a VR. This application could be considered the next generation of ISIS. My research has lead to an opinion that a menu driven interface limits the learning capability.

2 - Project Objective
The objective of this project is to develop a middleware application to enhance the educational experience of construction engineering students. The project will develop an avatar that will be embedded in a Virtual Reality environment for construction estimating. This avatar will be capable of understanding natural language and answering question to the students. This questions will focus on construstion materials and methods. NILCE will provide a venue for real time interactive 24-hour help to students and ease the burden on the professors' time. Eventually NILCE will evolve into a useful resource.


3- Methodology

The methodology for this project is present in two main components. The first component is the approach to accomplish the objective of the project. The second component is the milestones that will achieved during the development of NIILCE.

3.1 Approach
The approach consists of the considerations that will be taken during the process of developing the avatar. This list of considerations is by no means inclusive and are subject to change during the development of NILCE and further research and development leads.

3.1.1 Architecture

This application will be developed utilizing client-server architecture. The reasons this type architecture was the cost and flexibility it provides. With today's low cost processing power and cheap connection process a client-server application is by far the most cost efficient and scalable. (Peter C Patton )

3.1.2 Documentation

Meticulous documentation will be kept throughout the development of this project. There will be a on line Project notebook kept with all decisions, bugs. and reasons thereof. These notes will be kept on my domain accessible 24 hours a day at www.codequest.net. Also there will be a checklist of milestones and the percentage completed publically accessible along with any available source code. At the end of the project JavaDocs will be used to create an HTML page with instructions on how to use the different packages in NILCE.

3.1.3 Testing Plan

Testing will be on this project as soon as the first module is completed. All preconditions and postconditons for each module will be kept and incorpated into the battery of tests. Each component (module) of this project will be tested extensively for functionality and correctnesswith a series of "black box tests". The tests will consist of both good data and malicious input to try and uncover the flaws before NILCE enters beta testing. After each milestone is reached a system test will be performed on all of the intergrated components to ensure they are working properly. These tests will be a predetermined growing set of tests that will be ran through all system components upon completion. These tests will check for common mistakes such as good values of different types (i.e. Correctly spelled), boundary conditions, values outside of max, delimiter problems (different forms of delimters included), mixed case (hello, Hello, HeLlo), input is too long for string, input has white space or other delimiter, first element added/removed, middle element added/removed, last element added/removed and mispelled words.

After the simple website for NILCE is completed it will be open for beta testing by students. This will uncover a myriad of flaws as no other type of testing can. As the logic of NILCE is being tested work on incoprating it into a VR enviorment and adding the speech aspects will contine.


3.1.4 Storage of data

The data will be stored in a Mysql database within a series of nested tables. The decision trees will be serialized and stored in the database Tree objects in java can be easliy stored and retrieved from a relational database by utilzing the type blob which is an array of bytes. (Nerjova 2000)

The tables will be set up in the following fashion:


EX.


The above boxes represent tables in the database. There will be a broad grouping of "HOW" tables. Within the HOW table there will be a table for OBJECTS (STUD in the above example) that need representation such as walls, floorplans, roofs, etc. Within the STUD table will be the various trees holding information on that subject such as length, width, and the type.

The reason Mysql was chosen as the database is that they are portable from Windows to Linux with little modification and a fully functional free version is available for no cost. (MySql Database Server 2003).

3.1.5 Templates

Templates will be used to catch common phrases. They will be implemented in the following fashion.

(The X represents a variable that can be anything)

How do I get the X of a X?

This template could represent many questions.

How do I get the length of a stud?
How do I get the width of a wall?
How do I get the peak of the roof?

(Please note words such as this, a , of, the will actually be ignored by the parser will concentrate on the question (HOW), the qualifier (LENGTH), and the NOUN(STUD). The technique used will be further qualified below.)

These common templates will be loaded into memory in the form of a hash table. If a common question is asked such as 'How do I get the length of a stud" 'then the key will be the question and it will be pointing to the tree containing the information.

Templates where chosen as one method of processing because of their ease of implementation and a proven track record with the C.R.A.I.G project from the University of Wisconsin (Hughes 2002)

3.1.6 ATN Parser

ATN parsers are a technique developed W.A Woods in the 1960's. They are used to recognize sentences as their individual parts nouns, verbs, adjectives and adverbs. (Watson 2002) There are still some difficulties with recognizing the deep structure of input but these can be overcome by the filtering of the input text using some hard-coded rules.

The design of this parser will be based upon the world net database established by Princeton University. (Miller 2002) This type of parser was chosen because of the ease of implementation and the amount work of open source work that has already been done already. (Deitel &Deitel 1999) This project will only use a subset of the world net database customized for engineering problems.

3.1.7 Decision trees

The Virtual Estimator will use the simplest type of logic to begin with. Decision trees were chosen because of the wealth of information available on the subject and ease of implementation. (Savitch 1990 pg. 354) (Tanimoto 1995 pg. 387)

3.1.8 Decision Making Process
The decision making process will be made primarly by the user answering whether or not the question was asnswered to their satisfaction. If there is need for further definition the users will be prompted to enter a series of yes or no questions to clarify the question which will then be logged for further review by the administrator.

3.1.9 Speech

This application will utilize IBM's Via Voice to handle the speech recognition. Via Voice was chosen for several reasons the first and foremost is the immense amout of work that will be saved by allowing 3rd part software to handle the speech recognition apspects of NILCE. Also there is an extensive API provided with Via Voice for interfacing with Java (IBM). This will be invaluable asset in getting the application to run over the web.

3.1.10 Customizing for the web

The application will use a VR avatar named NLICE (Pronounced Nil-cee) that will be accessible over the Internet. For testing purposes a pre-made avatar will be used. Students will be able to log onto the website and ask questions regarding Construction projects in real time using voice communication through any browser with the correct plug-ins installed. Help files and information on installing the plugin's will be provided on site.

3.1.11 Operating system

This system will run on any platform with little modification but will be developed on the Linux platform. The foremost reason for choosing the Linux platform was the cost and the availability of free tools to aid in development.

3.1.12 Language

This application will be developed in Java. Java was chosen because of its object-oriented nature and the fact that it will run on any platform with little modification. (Deitel &Deitel pg. 18 1999) This will be an application not a java applet. The reason for this decision is that applets limit what you can do as far as accessing files on the machine it is running on. There are ways around this such as digital signing but the cost is prohibitive. (Code Signing Digital Id's 2003)

3.1.13 Object-Oriented

NILCE will be developed using an object-oriented methodology. This methodology has many advantages over its counterparts. First and foremost is the promotion of code reuse. According to Chuck McManus the power of object oriented programming lies in the fact that code is designed for reuse. (Patton pg. 1 1999) Any individual class of this system will have the ability to be imported into other applications. Another reason for choosing an object-oriented design is the strong encapsulation of objects and information hiding. This means that components of the application can be interchanged with ease. This is supported by the definition of encapsulation from Wikipedia. The definition on Wikipedia states, "In computer science and object-oriented programming, encapsulation or modularity refers to how objects contain data. Encapsulated code can generally be rewritten without any need to rewrite the encapsulating code". (Wikipedia 2003)

3.2 Milestones


Phase 1 - Research
Determine architecture (March 1 - March 5)
Decision decision logic (March 5- March 10)
Methods of NLP (March 10 - March 15)
Methods of implementing speech (March 15 - March 20)
Options for documentation (March 25 - March 30)
Phase 2 - Proposal
Abstract (March 31 - April 5)
Problem Statement (April 5 - April 10 )
Objective (April 10 - April 15)
Methodology section (April 15 - April 20)
Milestones section (April 20 - April 25)
Cost and Materials (April 25 - April 30)
Conclusion (April 30 - May 5)
References (May 5 - June 1)
Phase 3 Acceptance of proposal (June 1, 2003)
Phase 4 Foundation classes
Parsing engine (June 1 - June 10)
Template Grammer (June 10 - June 20)
ATN Parser (June 20 - June 30)
Monitor class (July 1 - July 10)
Driver for NLP (July 10 - July 21)
Phase 5 Installation of Mysql and driver (July 21 - July 25)
Phase 6 Database Interface
Create database (July 26)
Open connection (July 27)
Close connection (July 28)
Access Node (July 29 - July 31)
Update Node (August 1 - August 5)
Store Node (August 5 - August 10)
Phase 7 Decsion trees
Creation of tree (August 10 - August 20)
Update tree (August 20 - August 25)
Add tree (August 25 - August 27)
Delete tree (August 27 - August 31)
Phase 8 NLICE
web page for NLICE (September 1 - September 10)
Intergration of foundation and database classes (September 10 - September 20)
Testing of NLICE DB and logic (September 20 - September 31)
Design of Virtual World (October 1 - October 10)
Avatar added (October 10 - October 15)
Via Voice installation (October 15 - October 20)
Speech sent to server (October 20 - Octover 25)
Speech played back to user (October 25 - October 31)


4 - Cost and Materials

The cost of this project will be minimal. All of the source code will be either written or modified from open source projects. The only initial costs will be acquiring a copy of Via Voice and the setting up of a test server. Depending on the type of server and the method of acquisition of Via Voice the budget for the project should fall under $100.


5- Anticipated Result

By the end of this project my contribution to the next generation of VR Applications will exist. This will be part of a transformation from what is being done today in the realm of education with VR to what could possible be done tommorow. This middleware application will allow teachers and students to work together to enhance the educational experience. The teachers will have an alternative that will allow students help outside of the classroom that can be closely monitored by an expert in the field (thierselves) and the students will have a new medium for computer based help that will not require a intricanate knowledge of computers.

6 - Conclusion

Virtual reality presents tremendous possibilities in the realm of education. The environment as presented here offers an ideal solution to the problems with traditional Virtual Environments by performing a duel role as both tutor and student and allowing the user to communicate in a way that is familiar to them. (conversation)

This project should be considered nothing more than an open door into the endless possibilities of Virtual Reality. Hopefully the solid development practices and meticulous documentation will provide a good foundation for its continued development by any developers that wish to pursue the 21st century application.

References

Deitel &Deitel (1999)) Java How to Program Upper Saddle River, New Jersey

Miller , George A. (2002) World Net A lexical database for the English language retrieved May 1, 2003 from Cognitive Science Project Princeton University 221 Nassau Street Princeton, New Jersey http://www.cogsci.princeton.edu/~wn/

Watson, Mark (December 9, 2002) Practical Artificial Intelligence in Java retrieved April 1, 2003 from http://www.markwatson.com/opencontent/

Savitch , Michael Main Walter (1990) Data Structures and other objects using C++ Addison Wesley Longman Library of congress

Hughes, Cameron A (December 2002) FRAME AND TEMPLATE GRAMMAR retrieved April 1, 2003) from C.R.A.I.G Project Website Youngston State Universtiy http://www.cc.ysu.edu/~cahughes/craig_overview.html#frame_template_grammar

Bigus, Joseph P (2003) Constructing Intelligent Agents with Java Wiley Computer Publishing

Tanimoto, Steven L (1995) The elements of Artificial Intelligence using Common Lisp WH Freeman and company 41 Madison Ave, New York

Couvillion, Jr Warren C. . (2001) Navigating Virtual Worlds Technology Today, published by Southwest Research Institute http://www.swri.edu/3pubs/ttoday/fall01/navigate.htm

Patton, Peter C (1999) Recombinations: Client/server computing in the 1990s retrieved May 1, 2003 from Pennstate Printout Pennstate University http://www.upenn.edu/computing/printout/archive/v08/5/clinserv.html

Paton, Peter C Code reuse and object-oriented systems Java World
http://www.javaworld.com/javaworld/jw-12-1996/jw-12-indepth.html

Nerjova Mar 25, 2000 9:59 PM Serialization to a database Message posted to Java Forums archived at http://forum.java.sun.com/thread.jsp?forum=62&thread=131609

MySql Database Server (2003) retrieved May 10, 2003 from the MySql Homepage http://www.mysql.com/products/mysql/index.html

IBM (2003) Speech for java API retrieved April 1, 2003 from IBM homepage
http://www.alphaworks.ibm.com/tech/speech)

Brusilovsky,Peter (1989) ISIS-Tutor An Intelligent Learning Environment for CDS/ISIS Users Dept. of Cognitive Psychology, University of Trier http://cs.joensuu.fi/~mtuki/www_clce.270296/Brusilov.html

Code Signing Digital Id's retrieved April 11, 2003 from the Verisign - Security Center website http://verisign.netscape.com/developer/

Encapsulation (Object-Oriented Programming) retrieved May 1, 2003 from Wikipedia http://www.wikipedia.org/wiki/Encapsulation_in_object-oriented_programming

Middleware (defintion) retreived April 2, 2003 from www.Whatis.com
http://searchwebservices.techtarget.com/sDefinition/0,,sid26_gci212571,00.html

Encapsulation (Object-Oriented Programming) retrieved April 3, 2003 from Wikipedia http://www.wikipedia.org/wiki/Encapsulation_in_object-oriented_programming


       
  Last modified: June 21, 2003 2:04 PM Questions or comments?
The University of Southern Mississippi URL: http://www.usm.edu
AA/EOE/ADAI