SIMULATING A NEW WORLD
Obtain discipline-scale collection
- MEDLINE from NLM, 10M bibliographic abstracts
- human classification: Medical Subject Headings
Partition discipline into Community Repositories
- 4 core terms per abstract for MeSH classification
- 32K nodes with core terms (classification tree)
Community is all abstracts classified by core term
- 40M abstracts containing 280M concepts
- concept spaces took 2 days on NCSA Origin 2000
Simulating World of Medical Communities
- 10K repositories with > 1K abstracts (1K w/ > 10K)