...  
.....

GLOBECOM'87
CH2520-5/87/0000-1181 ©1987 IEEE
TOKYO, JAPAN

TELESOPHY: A SYSTEM FOR MANIPULATING THE KNOWLEDGE OF A COMMUNITY

Bruce R. Schatz

Bell Communications Research
435 South St, Morristown, NJ 07960 USA
schatz@canis.uiuc.edu

ABSTRACT

A telesophy system provides a uniform set of commands to manipulate knowledge within a community. Telesophy literally means "wisdom at a distance". Such a system can effectively handle large amounts of information over a network by making transparent the data type and physical location. This is accomplished by supporting a uniform information space of units of information. Users can browse through this space, filtering and grouping together related information units, then share these with other members of the community.

A telesophy system has the potential to be a major use of wide-area broadband networks. This paper discusses the model, usage, architecture, and technology of such a system. Experience with a working prototype is described along with future plans.

1. INTRODUCTION

Supporting the manipulation of all knowledge of a community has long been a dream of designers of computing and communications systems. Recently, maturing technology in a variety of areas has made it feasible to come much closer than before to achieving this dream. This paper discusses a model of providing uniform manipulation of all a community's knowledge, sample usage patterns, the underlying architecture of such a system, the technology which makes its implementation possible, a working prototype, and the future of such community systems.

The ideal system would have a single uniform set of commands which worked as similarly as possible on all the different variations of knowledge. The user should be able to browse through the existing knowledge, filter out the pieces relevant to the particular problem, combine these along with new compositions into a new piece of knowledge, and share that knowledge with the community.

Serious consideration of providing such a facility for a large community requires finding a workable solution to handling the differing properties of their data: it comes in many types, it exists in many locations, and there is an enormous amount of it. The user should be concerned only with examining selected pieces of information and relating them together as knowledge. This means that the system must hide the heterogeneity of the data. It must attempt to provide the following support:

  • type transparency,
    ability to use a uniform set of commands
    on any type of data.
  • location transparency,
    ability to reference remote data
    as though it were local.
  • scale transparency,
    ability to quickly select desired information
    from a very large set of possibilities.

Achieving this requires a multimedia information system which can handle large amounts of physically distributed data.

A system which provides these levels of support could be termed a telesophy system. "Telesophy", a word coined by the author, literally means "wisdom at a distance". Compare the word to "telephony", "sound at a distance", and "philosophy", "lover of wisdom". A telephony system permits the user to transparently send and receive sound without regard for the underlying computing and communications systems. Similarly, a telesophy system permits a user to transparently retrieve and store knowledge without regard for the underlying computing and communications systems.

2. MODEL

A telesophy system provides transparent manipulation by building an abstract model of the concrete data. External data is represented in some suitable electronic form and brought inside the system to be presented uniformly as information. Commands are provided to retrieve and store this information and relate pieces of it together to form knowledge.

Data comes in a variety of types. There are different media, such as the standard electronic types of text, graphics, and image. There are different formats, such as Nroff versus Scribe text documents or monochrome versus color images. There are even cases where the data is indirectly represented and its bits are instructions for controlling some physical device, such as frame numbers for a videodisc player.

The method for providing type transparency is to place a uniform package around every item of data to form an information unit or IU. Such an atomic IU provides a uniform format which enables the system to retrieve and store items of data without regard for their type. The enclosed data contains a type that is used to determine which set of programs is used to interpret the uniform set of commands.

Data is stored in many different places. These include: local disks, disks on the same physical network wire, disks within the same building, and disks across the country via a series of network gateways. The method for providing location transparency is to label every information unit with a number which is unique across the entire network. Given an atomic IL, its physical location can be inferred from its number, with the aid of a network map.

Data in a community exists in large amounts. Effective and uniform methods are needed for searching and filtering the data to discover relevant items. In a telesophy system. all data is manipulated as uniform information units. Three methods are provided for grouping related IL's together. Efficient implementation of these methods can then provide scale transparency.

The first two grouping methods are labels on the outside of an information package. Classification is a set of subfields giving strings describing the author, title, date, abstract, keywords, and so on. These labels can be used for the associative word-based search found in information retrieval systems. For example. all IUs labelled with the words "fiber" and "services" can be gathered with a single command. Connection is a set of numbers of other information units which are somehow related. These labels can be used to follow related links as in hypertext and semantic net retrieval systems. For example, having gathered the lUs label led with "fiber", all the IUs related with "see also" links can be retrieved with a single command.

The third grouping method is to recursively place. inside an information package, other information units. Contents is the item of data contained within an atomic IL. Every IL has a Contents. lUs can also be composite, i.e. contain a list of other atomic and composite IUs. In this case, the Contents are a set of numbers of other IUs which are conceptually contained inside the package. This is the mechanism provided for building new knowledge out of selections of old knowledge and sharing that new knowledge with members of the community. For example, after an extended browsing session, the user may have uncovered information units discussing "optical transceivers", "switching topologies", and "cache memories . These IUs can then be placed into the Contents of a new IU dealing with "fiber" and "services". Another user who later searches for "fiber" will be able to immediately find the IUs that the first user uncovered.

A collection of information units along with the grouping relationships between them is an information space. (This concept originated in a more restricted sense with earlier technology in NLS [31.) Figure 1 summarizes the section of information space given as examples above.

3. USE

The style of using a telesophy system is patterned after browsing in a physical library. Consider the following analogy. You want to browse in a library to better understand some imprecise topic in order to write a book. You start with some references from colleagues or books. You identify sections of the library by looking up the references in the card catalog. You go to one section of the library and quickly scan the spines of the books, occasionally pulling one out if it seems of interest. You then flip through the pages for more detail. Often, you realize by examining a book that you actually want a different section of the library and you walk over there. If some information in an examined book seem~ relevant to your problem, you copy it down, along with your annotations. You then follow other references, continuing this process until you are satisfied. Finally, you write your book utilizing the gathered selections together with new information which you contribute. When completed and published. this new book itself is submitted to be placed in the library. A telesophy system supports this process with general definitions of "library" and "book", i.e. with distributed multimedia information.

Users of a telesophy system can browse by navigating through the information space for their community. They construct a region in this space by issuing queries and composing new information. A region of an information space is a temporary collection of IUs that a user is examining to uncover a relevant set for the current problem. It might be thought of as nascent knowledge that will change back into information when it is stored back into the space.

A typical session might be: create a new region, issue a query to gather a set of prospective information units from the space, manually delete the inappropriate units, explicitly add selected units from another region, compose a few new information units, add words to the region's classification for later associative retrieval, and store the region back into the information space as a new information unit. This new unit can then be further related by creating connection links between it and other units.

Figure 2 contains a sample screen from the prototype telesophy system described later. A new region has just been constructed by gathering together many information units, analogous to an annotated bibliography. The information units forming this region are shown in the window in the upper left. One of the queries whose results were examined to generate this region is shown in the window in the lower left. Each line is a summary of a unit (here a magazine article) analogous to the book spines on a library shelf. While examining that query. 'one of the units was opened and the text flipped through; this appears in the lower right. A picture linked to the unit appears in 'the upper right.

FIGURE 2: Sample Telesophy Usage

4. ARCHITECTURE

A schematic of an architecture for a telesophy system is shown in Figure 3. A prototype implementing this architecture is described in more detail in section 6. Basically, there is a local object system connected to many copies of a remote search system by a series of network gateways. The local software handles the type transparency for the data. The network software handles the location transparency for the data. The remote software handles the scale transparency, i.e. it efficiently searches indexes to the data packaged inside information units.

The local software runs in the user terminal. It is an object system, such as that in Smalltalk [4]. For each type of data, there is a class definition which gives the specific set of operations required to properly implement the uniform user commands. When one of the user-visible uniform commands is issued on an information unit, the system uses the type of the contained data to pass the contained set of bits to the appropriate type-dependent class operation. For example, when the user selects an IU and "zooms" into it for display, one display operation is called for text and another called for images. The local software also supports the user interface and caches the currently used set of information units as they are received over the network.

The network software runs in a gateway machine. Associative indexes to the information units exist on a machine in the network. The gateway routes queries to servers for appropriate indexes and routes the results back to the cache of the requesting user. It consists of a directory to locate the appropriate query servers and a protocol to transfer the bytes to the appropriate network address. The simplest directory just routes the query to all servers. If the community has a large amount of knowledge in its information space, it may be necessary to restrict the routing for economic or performance reasons. Possible mechanisms include searching the "closest" servers first and semantically classifying the indexes available. An underlying network protocol is used to reliably send the packets of data across a potentially multiple gateway network.

The remote software runs in "centralized" file servers. It is a full-text retrieval system, such as that described in [6] or used in Nexis TM (a commercial system operated by Mead Data Central). The IUs are contained in information stores and there are a separate set of data indexes used for associative retrieval. Each query server consists of an index and a retrieval program. A typical index contains a set of words used to classify information units along with pointers for each word to all the units classified by that word. The retrieval program parses a query, locates all words which match the query, follows the pointers to the corresponding information units, and returns the numbers of those units. A storage program is also available, which adds new words to an index. When an information unit is indexed. the contents are scanned to attempt ~to automatically generate appropriate words, which works well for text but poorly for other types. Those words plus the words in the classification are then added to the index along with the pointers to where the unit is physically stored. This same mechanism is used to implement "sharing" i.e~ to store user-defined regions back into the information space as new information units.

5. TECHNOLOGY

It is currently possible to build an early instance of a telesophy system. But even with technology a~dilal)le in the foreseeable future, there are limitations on the de~ee of transparency that can be provided. Some general considerations are given below on what is possible now and what might be possible in 5-10 years. The base hardware assumption is a bitmapped workstation in the $10,000 range connected via a LAN (or LAN extension) to a file server.

For type transparency, there must be operations sufficient to support the uniform set of commands. Consider the commands for display and for editing. For the standard computer types of text and graphics, it is possible to fully provide both display and editing. There are, of course, many possible variations, from screen editors to interactive document processors and from line drawing packages to real-time threedimensional graphics. For images(still digitized pictures), it is possible to display them on bitmapped workstations in either monochrome or color and edit them as a collection of bits with a paint program. The hardware support required for full editing such as real-time resizing will probably reach the mass personal computer market in the future. For video and audio, it is possible to display them but editing requires expensive special-purpose hardware. With current LAN technology, video must be carried by a separate analog network but this may change in the future. Physical objects can be handled to a lesser degree. Some may be connected to, such as a placing a telephone call to a person. some may be controlled, such as directing a robot to fetch a videotape; and some may only be described, such as which book on which shelf.

For location transparency, there must be network speed sufficient to permit fast browsing. Supporting the complete point-to-point transmission is possible; it just requires a hierarchical network with central administration to log the indexes and the machines. For example, the telephone network performs world-wide voice transmission and the ARPA Internet [5] performs nation-wide data transmission. The speed problem revolves around downloading information quickly enough to rival physical browsing in a library. The key parameter is the effectii'e rate of the end-to-end network (including peripherals). This is the rate at which bits from a remote source are transferred into the local memory of the user terminal. This is typically much lower than the raw network rate since it is affected by the hardware (disks, buses) and the software (protocols, caching). For example, current 10 Mb/s LANs connecting current engineering workstations have effective rates of around 3 Mb/s. This is sufficient for one second download of some 15 magazine articles, each with a few pages of text and still pictures. Handling books or video will require a higher effective rate. Providing this will require designing a new end-to-end system since merely increasing the network rate will not by itself increase the effective rate. [8] contains a detailed discussion of this issue.

For scale transparency, the information must be fully indexed. For a community there will be many different indexes. It must be possible to partition these so that each index can be efficiently searched independently of the others. The most common current query style is "word matching", where a logical expression of words is given to be compared to an index. If the words classifying the information units are indexed as described previously, it is possible to handle extremely large information spaces (with hundreds of millions of units). Each individual index can efficiently handle a million items (as with the current commercial information retrieval systems) and the indexes can be partitioned independently if more speed is necessary. The administration required to maintain the indexes is another question.

Text data can be automatically scanned and the words extracted for indexing. All other types of data, however, are only indexed by words manually entered into the classification fields. For example, the prospects for automatically recognizing the objects in a picture are not good in general. The system can only provide matching support on the attached caption. The same comment holds in the foreseeable future for automatic recognition of words from continuous speech.

6. PROTOTYPE

The Telesophy Project at Bell Communications Research is constructing a prototype to investigate the functionality and architecture of a telesophy system. The strategy taken was to build a system which could support browsing and sharing inside an information space of one million information units, to place a representative sample of data into the space, and to experiment with a small community consisting of the implementors and a few colleagues.

6.1 The System

A feasibility study was completed in 1984 by the present author [7]. In 1985, two additional researchers were hired and an initial prototype was built. basically a window manager as a front end to an existing information retrieval system. In 1986, a prototype with the architecture discussed above was completed. This year, the initial community is beginning to use the prototype on a daily basis.

The prototype is custom applications-level software running on commercial engineering workstation hardware. The front end is an object system with multimedia display and editing plus support for a window/menu-based interface [2]. It also provides a local caching capability to store the contents of IUs retrieved from queries. The back end is a full-text retrieval system with word indexing and proximity searching [11. The software is written in C and runs under Berkeley-derived versions of the L,Unix TM operating system. The network connection uses sockets, transported via TCP/IP. The current hardware is a Sun Microsystems workstation network, with a 3/280S file server connected to local 3/75 and 3/160 workstations via an Ethernet.

Subject to the limitations described in the previous section, transparency is supported fairly well. For type transparency, a variety of types of data are available uniformly including: typed-in text (notes), previously-external text (magazine articles), graphics (line drawings), black/white images. color images (8 bit pseudo-color). and video/audio (played segments). For location transparency, the system routinely runs with a variety of query servers running on several file servers of different configurations. Within the same building, the TCP/IP software maintains roughly the same effective rate through several transparent gateways. For scale transparency, the indexing and retrieval provide quite acceptable response for a space of several hundred thousand IUs.

The performance of the system feels reasonable. For the available set of data, the speed is comparable to browsing in a physical library. Queries typically take a few seconds while displaying and scrolling appear instantaneous. The speed is limited primarily by the effective network rate, about 3 Mb/s for the software and hardware configuration. The downloading strategy which seems to produce about one second response is to download complete information units corresponding to a window full of one-line summary lines (about 15) when they contain text but to download pictures only when specifically requested by the user.

6.2 The Data

The data being placed into the information space is intended to be a sample of that useful to the current community. The data includes:

  • Messages. These are short, transient, and topical. They include: electronic mail, bulletin boards (Netnews), and wire service (Associated Press). The messages are gathered daily from external sources and automatically placed into the information space. Typed-in personal notes are also supported.
  • Citations. These are the author, title, and abstract of journal or magazine articles (without TM the body). They include: INSPEC (an IEE- maintained database of electronics and computing journals) and TELARIS TM (a Bellcore-maintained database of telecommunications clippings). These represent some 100,000 references.
  • Magazines. These are full-text articles from current issues (without pictures). They include popular magazines (Business Week. Scientific American. Technology Review. New Republic. National Review) and trade magazines (Electronics. Data Communications, Datamation, PC Week, Broadcasting). These represent some 5000 articles.
  • Company. These are useful internal listings. They include: the library catalog. the telephone directory. and the memoranda list.
  • Pictures. These are display-only files generated by some external process. They include: line drawing graphics, monochrome images, videodisc stills, magazine figures, and glossy photographs. A laboratory facility is available for digitizing pictures and then converting them into a displayable format (1 bit monochrome, 8 bit pseudocolor, or 24 bit full color). Only about 50 pictures are available since the classification must be manually generated.
  • Video. These are display-only moving and still pictures. A bank of videodiscs is available controlled by a switch which is connected via a separate analog network to a separate television monitor next to the workstations. Information units are defined which classify segments (ranges of frames) from specified videodiscs. A request to display these units plays the frames on the monitor.

7. FUTURE

The prototype demonstrates the feasibility of a system which can manipulate all the knowledge of a community. To the degree possible with existing technology, it really does provide transparent uniform manipulation. The type, location, and scale of the external data are completely hidden. Powerful facilities are provided for information manipulation, the retrieval, filtering, and storage of information units, and for knowledge manipulation, the relating these units together in the information space in a variety of different styles.

In the existing prototype system, the concentration has been on browsing and sharing rather than on composition of new material. It was felt that current facilities for re-grouping existing knowledge are much poorer than facilities for composing new knowledge de novo. The current data generation strategy is to take external data, package it into information units, and place these ~nit': into the information space. The user can then combine selected existing units into new information units along with annotations consisting of short typed notes. The prototypical user might be thought of as a magazine editor or a decision analyst rather than as a researcher or an author.

To provide complete knowledge manipulation, enough programs must be available so that the user 'need use only the telesophy system. A complete set of facilities for data generation using the system should be provided, such as typing papers, drawing graphics, scanning images, and viewing video. This might extend to the authoring support found in hypertext and outlining systems.

In the existing prototype data, the primary deficiency is its lack of coverage. The architecture of a telesophy system is capable of handling anything anywhere. Although a major effort was expended in gathering data, it is more a sample of the different types of possible material than some significant fraction of the data actually necessary and useful to our community. For example, such essential material for the current community as the text of conference proceedings and journal articles proved not to be available in suitable electronic form.

The ideal future experiment to demonstrate a telesophy system would be to use it to permit a department-size community to go "cold turkey electronic". That is, take a relatively homogeneous community of 50 people, attempt to place all their knowledge into their information space, and try to support all of their knowledge manipulation using the system. This would require a major engineering effort to support all the necessary data manipulation, such as media editors and specialized functions. It would also require a major administrative effort to gather all the relevant data and keep it continuously updated.

Running this ideal experiment requires finding a community which is willing to participate in an experiment and which also has relatively self-contained data needs. If only a few major data sources suffice for; a significant portion of their work, the system might become a useful standard tool. It will be interesting to see how this new tool changes everyday behavior.

8. REFERENCES

[1] S. Bulick, "A Network Information Retrieval System", Bellcore Database Symposium, Sep 1987, to appear.

[2] M. Caplinger, "An Information System using Distributed Objects", OOPSLA '87 Conf Proc, Oct 1987, to appear.

[3] D. Engelbart and W. English, "A Research Center for Augmenting the Human Intellect", Proc FJCC,
.......vol 33, part 1, AFIPS Press, 1968, pp 395-410.

[4] A. Goldberg and D. Robson, Smalltalk-80: The Language and its Implementation, Addison-Wesley, 1983.

[5] L. Landweber, D. Jennings, and I. Fuchs, "Research Computer Networks and their interconnection", IEEE ..Communications Magazine, vol 24, Jun 1986, pp 5-17.

[6] G. Salton and M. McGill, Introduction to Modern Information Retrieval , McGraw-Hill, 1983.

[7] B. Schatz. "Telesophy", Technical Memorandum TM-ARH-0O2487, Bell Communications Research,
.......Aug 1985, 74pp.

[8] B. Schatz, "Browsing Services", to be submitted to IEEE Communications Magazine.


 

 

 

 

 

 

 

.....