Volume 5 Issue 3
The Mathematics Archives
Making Mathematics Easy to Find on the Web
Earl D. Fife
University of Tennessee - Knoxville
Do a search on AltaVista for "algebra". What do you get? Nearly 700,000 hits, of which AltaVista will allow you to view only what it determines is the top 200. Major search engines such as AltaVista, Excite, HotBot, Lycos, and the like continue to provide a valuable service, but with the recent growth of the Internet, topic-specific sites that provide some organization to the topic are increasingly important. It the goal of the Mathematics Archives to make it easier for the ordinary user to find useful mathematical information on the Web.
The Mathematics Archives (http://archives.math.utk.edu) is a multipurpose site for mathematics on the Internet. The focus is on materials which can be used in mathematics education (primarily at the undergraduate level). Resources available range from shareware and public domain software to electronic proceedings of various conferences, to an extensive collection of annotated links to other mathematical sites.
All materials on the Archives are categorized and cross referenced for the convenience of the user. Several search mechanisms are provided. The Harvest search engine is implemented to provide a full text search of most of the pages on the Archives. The software we house and our list of annotated links to mathematical sites are both categorized by subject matter. Each of these collections has a specialized search engine to assist the user in locating desired material.
Services at the Mathematics Archives are divided up into five broad topics:
All pages present at the Mathematics Archives can be searched using our Harvest search engine. This is useful for material that actually resides on our site, but since Harvest returns pages rather than individual links contained within pages, we have written our own search engine to search our annotated list of links.
- Links organized by Mathematical Topics: This leads to a list of topics which, in turn, are linked to pages of annotated links. We have written a script to search the annotations associated with each of the links and return the annotated link whenever a match is found. A search on "algebra" returns 145 links, and these are annotated so that the search can be refined to a more manageable list. Although this list is considerably shorter than the nearly 700,000 that AltaVista claims to have found, these are links to sites that have been individually viewed, annotated and classified. Furthermore, the assessing of the site is done by a professional mathematician, making it more likely to be properly identified and the information on the site more likely to be reliable and useful.
- Software: Organization of software was the original goal of the Mathematics Archives, and it remains as one of our popular services. Our collections of both Windows/DOS and Macintosh software are organized by mathematical topic (algebra, calculus, differential equation, etc.), then each package is reviewed and tested. Reviews are posted as well as the package and appropriate hypertext links. All text files associated with the packages are searchable to aid users in finding appropriate packages.
In addition to these two collections, we also mirror several widely used packages available on multiple platforms (such as GAP, MuPad, and the SLATEC science library) and an extensive collection of links to other software sites.
- Other Math Archives Features: This includes one of our most popular pages, Pop Mathematics. If you have ever wondered what it is about mathematics that mathematicians find so interesting, this is the page to view. Here we have included links to sites that have especially interesting content. We know that if we enter e (probably exp(1)) or pi2/6 into our calculator, we get a decimal number as an answer (actually an approximation). But when we encounter a decimal number (0.543656365691809), how can we find out whether or not it is a decimal approximation to a "special" number? Try the Inverse Symbolic Calculator (available on our Numbers subpage). (Here is the answer.) Would you like to be the a part of mathematical history by finding the next largest prime number? Join the Great Internet Mersenne Primes Search (GIMPS) (again, available on the Numbers subpage). Through GIMPS, a high school student discovered the largest known prime number to date.
Another significant link is to the UTK Mathematical Life Sciences Archives organized and run by Louis Gross. This was established during the first year after the Archives' move to the Univeristy of Tennessee - Knoxville, and it has become one of the important resources in Mathematical Life Sciences.
Finally, this area of the Archives contains links to services we provide to the mathematical professional community, such as electronic proceedings or poster sessions of conferences (MAA, ICTCM, AMATYC and CTM), pages for professional organizations (AMATYC and SMMEA), and archives of newsgroups.
- Other Links: These are links of interest to people "in the profession". They include lists of mathematics departments and professional societies and organizations as well as pointers to electronic journals and TEX links (the type setting program used predominantly by mathematicians).
Technical AspectsThere are three technical aspects of the Mathematics Archives that may be of interest to D-Lib readers -- designing a site such as ours in such a way that ftp, gopher and http all work together off of the same basic structure, construction of the search engine for the topics pages, and implementing automated link checking to maintain large collections of links.
A Unified Site DesignOur collections of Windows/DOS and Macintosh software are the two services that were established first at the Mathematics Archives. When we established the Mathematics Archives at our present site in 1993, we implemented both ftp and gopher. Making both services share the same "root directory" is just a matter of setting configurations of ftpd and gopherd properly. When we introduced http, Mosaic had just been released. Now it was possible to have links to images and text all on one page. Users who had web browsers wanted to view http sites, not return to gopher sites. So, to attract these users and keep their interest in the Mathematics Archives, we wanted to keep calls to gopher to a minimum. Furthermore, not all early web browsers handled the gopher protocol as well as they do today.
We addressed the problem with cgi scripts that would read the information contained in the gopher .links files and the files within the .caps subdirectory to generate html pages. This process is best illustrated within the Macintosh software collection. The main page is a static page "http://archives.math.utk.edu/software/.mac.directory.html" housed on the Archives server. Within this page are links to each topical directory within the collection. Within each topical directory (for example calculus) is a static file named .directory.html which is used to link to each package. (In the calculus directory, it looks like this.)
When a package is selected, say 3D-Filmstrip from within the frame on the right, all files associated with the package are listed within the frame on the left. This is the information from the files within the .caps subdirectory of the 3D-Filmstrip directory and from the .links file with the 3D-Filmstrip directory. The script producing this page makes a list of available files embedded in an appropriate html anchor and identifies the action to be taken on the file. For example, it downloads .hqx files (indicated by ), calls a script to read text files (indicated by ), and creates a hypertext link to remote web pages (indicated by ). If the file is a text file to be read (e.g., Abstract of 3D-Filmstrip) the script called for generates an html page consisting of the text of the file and ending with a list of the other files and links associated with the package.
At any time during the process of trying to locate a package for a particular task, the user may search all textfiles within the Macintosh collection.
A Search Engine for Links Classified by TopicFor over a year we had maintained a list of links to mathematical sites organized by mathematical topic. It had been a popular page, and it even was the model on which other sites began to organize their links. (See AMS, for example.) However, the list was getting so long, and keeping track of the cross-referencing was becoming so tedious, that we restructured it to allow us to perform searches on links and their annotations (keywords).
We now have one page for each major topic, and each link is listed only on the page of its primary classification. Then, each link and annotation conforms to the same pattern:
To perform a search on a word, a Perl script performs the following tasks:
- It is an element of an unordered list.
- The URL is followed by one or more .gif images indicating particular features.
- A list of keywords follows.
This simple structure allows us to update the lists easily and provides the user with a reasonably powerful searching mechanism over the large number of links we have. The searching form appears at the bottom of each topics page and of each dynamically produced html page returning the results of a search.
- Opens each topics page, one at a time.
- For each topics page, it reads the unordered list of links, discarding the remainder of the page.
- It splits the list using the <li> tag as a delimiter between items.
- For each item, it now uses regular expressions to search for the appearance of the desired word.
- If the word is found, the link is stored in memory.
- Upon completion of searching all links on all pages, an html page is generated listing all of the links found to have contained the desired word.
Automated Link CheckingShortly after we began organizing links, we realized that keeping our lists current (i.e., removing broken or outdated links) would be an overwhelming task if done by hand. (Our current count is 4,200 web pages with a total of over 33,000 links.) A Perl script was available from David Sibley for checking the links on a single page. It reads the page, parsing it for links, then for each link found, it performs a HEAD request and, if that fails, performs a GET request. We have modified the script to perform some additional specialized checks for us and used it as the basis for another script to do link checking on multiple pages.
Once a week, a cron job runs a script that searches our disk drive for all .html pages and stores a list of them in a file. Before using that file, a second script sorts it and removes files that also appear on the "exceptions list". (This is a file we use to eliminate pages from the searching procedure because they are test files, they contain only local links, or other various reasons.) Then each evening, our multiple file checking program is called by cron and performs a check of approximately 1/7 of the files listed. Errors are written to external files named by date and owner of the file. Each morning, if an owner of a file has errors reported during the previous evening's run, that person's error file is automatically emailed to him or her. Each person is now responsible for errors within their own files. These are checked by hand and corrected or removed.
Concluding RemarksThe classification and identification of mathematical links by people within the profession has been a strength of the Mathematics Archives. Implementing mechanisms to make navigation around the Archives easier for the user has required only a modest amount of scripting. Yet the benefit to the user can been considerable. It can yield richer results in searches than are available from the mega-search engines, and the special highlighting of particularly interesting sites (such as through the Pop Mathematics listing) can even result in more fruitful browsing.
Disciplined-oriented web sites can provide users with the means of finding the wealth of information on the Internet for their specific discipline. The technical expertise to create such a site is becoming more readily available within professional disciplines as the Internet grows in popularity within the professional community. Although commercialization seems to be taking over the Internet, there is an ever-increasing demand for free sites such as the Mathematics Archives and the services it provides.
AcknowledgementsThe Mathematics Archives was created in 1993 with funding from the National Science Foundation (DUE-9351398 & DUE-9550943), The Tennessee Science Alliance, Calvin College, and the Department of Mathematics of the University of Tennessee - Knoxville. The co-directors of the Archives from its inception have been the authors of this article, Earl D. Fife and Larry Husch. At various times and durations since its inception, the following have volunteered their services:
We gratefully acknowledge their support and contributions.
- Przemek Bogacki of Old Dominion University
- John Emert of Ball State University
- Lou Gross of the University of Tennessee - Knoxville
- Al Hibbard of Central College
- Dave Joyce of Clark University
- John St. Clair of Motlow State Community College
- Todd Will of Davidson College.
Copyright © 1999 Earl D. Fife and Lawrence Husch
Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous Story | Next story
Home| E-mail the Editor
D-Lib Magazine Access Terms and Conditions