Notes
Slide Show
Outline
1
The Evolution of the Net:
Predicting Global Infrastructure
2
Art of Physical Architecture
3
Art of Logical Architecture
4
The Evolution of the Net
  • Niels Bohr on Quantum Theory
  • “Prediction is very Difficult, especially about the Future”
5
THE THIRD WAVE OF NET EVOLUTION
6
Computer Science and Infrastructure
  • Transparent Federation across Sources
  • Generic Protocols for Global Infrastructure


  • Ultimate Goal is cyberspace visions of
  • “being one with all the world’s knowledge”
7
Computer Science and Infrastructure
  • 1985 Operating Systems caching
  • 1995 Database Management tagging
  • 2005 Information Retrieval clustering
  • 2015 Artificial Intelligence recognizing
8
Linguistics Levels and Universal Units
  • 1985 Syntax Files (wholes)
  • 1995 Structure Records (parts)
  • 2005 Semantics Concepts (meaning)
  • 2015 Pragmatics Features (reality)
9
 
10
 
11
1985 Syntax Federation
  • Same Query into Multiple Sources
  • Results return Uniform Packages
  • Packets are for Bits, but Objects need more
  • Information Units are for Database Items


12
1985 Technology Environment
  • CMU Computer Science – Andrew
  • Apollo Domain – distributed file system
  • Xerox Star – multimedia document system


  • Bellcore Network Systems – Fibers
  • Telenet – International Packet Switches
  • Dialog – Bibliographic Text Searches


13
Telesophy Prototype
  • Distributed Documents
  • Distributed Collections
  • Multimedia Documents


  • Networked Hypertext
  • Document Browsing (links across sources)
  • Document Search     (texts across sources)


14
Telesophy Session
15
Telesophy Implementation
  • Bitmapped Workstation with Custom Software
  • $30K Apollo with 10Mb/s WAN
  • Windows via Brown [hypertext]
  • Objects via Xerox [Smalltalk]
  • Information Units and Data Items
  • 300K Units across 20 sources
  • Bellcore R&D, $2.5M 1984-1988
16
Operating System Research
  • Browsing requires Caching across Internet
  • Raw bandwidth insufficient
  • 200ms Ping versus 250ms Saccade
  • Lookahead Applications Specific Protocols
  • 1987 Internet Research Task Force
  • 1989 ARPANET 20th Anniversary
  • 1990 Dissertation on Interactive Retrieval
17
 
18
1995 Structure Federation
  • Search using Parts of Documents
  • Transparent merge different Schema
  • Results return Complete Displays
  • Displayers invoked for all types


19
1995 Technology Environment
  • NCSA and the World-Wide Web
  • Mosaic – multimedia document browsing
  • HTTP – standard query protocol


  • University Library and Online Retrieval
  • Ovid – full-text journal searching
  • SGML – standard document protocol


20
DeLIver System
  • Full Distributed Documents
  • Full Displays with tables and equations
  • Distributed Collections from publishers


  • Single Federated Collection
  • Streamlined search using tag structure
  • Canonical tag schema with translation


21
DeLIver Session
22
DeLIver Implementation
  • Desktop PC plus Custom Software Integration
  • $5K IBM Personal Computer
  • Mosaic via NCSA [hypertext]
  • Displays via SoftQuad [viewers]
  • Custom DTD and SSL for tags and styles
  • 100K articles for 3000 users
  • NSF DLI, $5M 1994-1998
23
Database Management Research
  • Metadata Extraction for Structure Federation
  • Raw schema insufficient
  • Different names and different types
  • Author tags in physics vs mathematics
  • 1995 interactive databases using Mosaic
  • 1997 Beat Elsevier using canonical tags
  • 1999 production distributed XML federation
24
 
25
2005 Semantic Federation
  • Search using Concepts above Words
  • Extraction of Concepts from Documents
  • Statistical Index on Community Collections
  • Concept Navigation across Collections


26
2005 Technology Environment
  • Web Portals and statistical NLP
  • Google – statistical linked contexts
  • NLP – statistical generic parsers


  • Fast Processors and Big Disks
  • Gigaflops – Beowulfs and cluster computing
  • Terabytes – RAIDs and literature scaling


27
BeeSpace System
  • Fully Parsed Documents
  • Concepts and Entities auto generated
  • Distributed Collections from communities


  • Fully Related Concepts
  • Switching across Community Repositories
  • Automatic Links to Entity Databases


28
BeeSpace Session
29
BeeSpace Implementation
  • Commodity PC plus Custom Software
  • $1K Dell Personal Computer
  • $15K Server 1 Gflops 2 TBytes
  • Semantic Indexing generic scalable
  • Concept Extraction and Normalization
  • Concept Co-occurrence on Collections
  • 50M articles across 50K repositories
30
Information Retrieval Research
  • Statistical Clustering Equivalent Phrases
  • Raw phrases insufficient
  • Phrase parsing with normalization
  • Entity recognition with normalization
  • 1998 semantic indexing
  • (concepts from terms)
  • 1999 information spaceflight
  • (categories from documents)
31
CONCEPT SPACES
  • from Objects to Concepts
  • from Syntax to Semantics
  • Infrastructure is Interaction with Abstraction
32
LEVELS OF INDEXES
33
Technology Trends
  • IEEE Computer for January 2002
  • Information Infrastructure for Trends issue


  • Document Representation (Semantic Web)
  • Language Parsing (TIPSTER)
  • Statistical Indexing (TREC)
  • Peer-Peer Networking (SETI@home)
  • Vocabulary Switching (UMLS)


34
SCALABLE SEMANTICS
  • Automatic indexing
  • Domain-Independent indexing
  • Statistical clustering



  • Compute  Context  of


    • concepts within documents
    • documents within repositories
35
COMPUTING CONCEPTS
36
SIMULATING A NEW WORLD
  • Obtain discipline-scale collection
    • MEDLINE from NLM, 10M bibliographic abstracts
    • human classification: Medical Subject Headings
  • Partition discipline into Community Repositories
    • 4 core terms per abstract for MeSH classification
    • 32K nodes with core terms (classification tree)
  • Community is all abstracts classified by core term
    • 40M abstracts containing 280M concepts
    • concept spaces took 2 days on NCSA Origin 2000
  • Simulating World of Medical Communities
    • 10K repositories with > 1K abstracts    (1K w/ > 10K)
37
COMMUNITY  PROCESSING
38
INTERSPACE NAVIGATION
  • Semantic Indexes for Community Repositories


  • Navigating Abstractions within Repository
    •  concept space  &  category map


  • Interactive browsing by Community experts


  •    *www.canis.uiuc.edu/interspace-prototype
39
Interspace Remote Access Client
40
Navigation in MEDSPACE
  • For a patient with Rheumatoid Arthritis
  • Find a drug that reduces the pain (analgesic)
  • but does not cause stomach (gastrointestinal) bleeding


41
 
42
Concept Navigation
43
Retrieve Document
44
Navigate Document
45
 
46
Concept Navigation
47
 
48
SWITCHING
  • In the Interspace…
    • each Community maintains its own repository

    • Switching is navigating Across repositories

    • use your vocabulary to search                 another specialty
49
CONCEPT SWITCHING
  • “Concept” versus “Term”
    • set of “semantically” equivalent terms
  • Concept switching
    • region to region (set to set) match
50
Biomedical Session
51
Categories and Concepts
52
Concept Switching
53
Document Retrieval
54
THE NET OF THE 21st CENTURY
  • Beyond Objects to Concepts
  • Beyond Search to Analysis
  • Problem Solving via Cross-Correlating Multimedia Information across the Net


  • Every community has its own special library
  • Every community does semantic indexing


  • The Interspace approximates Cyberspace
55
 
56
2015 Pragmatics Federation
  • Beyond Words and Concepts to Reality
  • Feature Vectors describing Situation
  • Each Individual has Vector (< Community)
  • Discrete Samples into Continuous Monitors
57
2015 Technology Environment
  • Continuous Vector Recording
  • Health Grid – personal lifestyle monitors
  • Peer-to-Peer – beyond Napster and Amazon


  • Individual User Modeling
  • Cohort Grouping – custom clustering
  • Adaptable Interfaces – multiple levels


58
Lifestyle Monitor System
  • Continuous Monitoring
  • Adaptive Questionnaires full-spectrum
  • Distributed Collections from individuals


  • Situational Analysis
  • Structured Vectors custom for Individuals
  • Population Cohorts for Decision Support


59
Lifestyle Monitor Questions
60
Lifestyle Monitor Session
61
Artificial Intelligence Research
  • Structured Vectors Individual customized
  • Raw concepts insufficient
  • Adaptive Concepts for individual situations
  • Structured Vectors for cohort clustering
  • Situational Analysis infrastructure support
  • 2007 Internet Health Monitors prototypes
  • 2011 Population Health Monitors for chronic illness regionally deployed
62
THE DISTRIBUTED WORLD
  • Community Repositories in the Interspace
  • Peer to Peer Networking Infrastructure
  • Every Person performs Every Role



63
FEATURE VECTORS
  • from Concepts to Features
  • from Semantics to Pragmatics
  • Infrastructure is Interaction with Abstraction
64
Towards the Intermind
  • Beyond Concepts to Features
  • Beyond Analysis to Synthesis
  • Problem Solving via Cross-Correlating Universal Knowledge across the Net


  • Every individual has its own special vector
  • Every viewpoint does semantic clustering


  • The Intermind is true Cyberspace
65
Today the Hive
Tomorrow the HiveMind