Computers in Genealogy - Sept. 1995
An Information Service for United Kingdom & Ireland Genealogy based on the Internet's World Wide Web
Malcolm Austen, Vivienne Dunstan, Brian Randell,
Alan Stanier, Phil Stringer, and John Woodgate
The World Wide Web
The World Wide Web is a service, indeed a phenomenon, on the Internet that we believe is of very significant potential to genealogists. The staggering growth rate of the "network of networks" called the Internet (currently estimated to be doubling in size every year, and to link five million computers in 90 countries) is now both dwarfed and fuelled by that of the World Wide Web (WWW) service that runs on it.
WWW, or the Web, as it is often called, originated at CERN (the high energy physics research laboratory near Geneva). It was developed in order to provide users of the many disparate computers at CERN with a uniform and convenient means of accessing the various types of information held on these computers. WWW was publicly announced in 1992, but really took off in 1993 when graphical interface software (the Mosaic Web "browser") was developed by NCSA at the University of Illinois, and made available as freeware for UNIX, PCs and Apple Macintosh computers. The main growth was at first in the United States, but in the UK over the last year there has also been a dramatic growth; this has been both via direct access to the Internet, and via commercial networking services such as CompuServe.
Mosaic now has many imitators and competitors - of which the most notable is currently Netscape. There also exist text-only browsers, such as Lynx. However, most browsers provide a graphical interface, and allow the use of a mouse or other pointing device, such as is standard with Microsoft Windows or MacOS. In what follows we couch our description in terms of the use of such an interface.
By using such a browser package any computer user who has a computer of adequate configuration, and means of connecting to the Internet (e.g. via telephone modem to an access provider such as Cityscape or Demon, or to CompuServe), can view documents and databases held on computers all round the world as though their contents have been linked together into what appears to be a single very large document. This "document" can in fact include not just text, but also pictures, sound and video. Users can browse through it, copying or printing off anything of interest to them. They need not even be aware of the fact that multiple computers and computer networks are actually involved. There will be evident delays if and when the user's actions cause large amounts of information to be fetched ("down-loaded"); but with a modern telephone modem, operating at say 14.4 kbaud or, better still, 28.8kbaud and a memory adequate to retain recently-accessed information in case it is to be viewed again, the Web is an extremely usable and effective system, especially for mainly text-based information.
The "world-wide document" in fact consists of a vast number of "pages" (that may in fact be larger than the user's screen, in which case scroll bars enable the desired part of the page to be made visible), within which there are visible "hypertext" links. These links are typically words or phrases that are underlined and/or colour-coded. By pointing at such a link and "clicking" on it, the user will cause the presently-visible page to be replaced on the user's screen by the one that the link was in fact pointing to - no matter where the file of information forming this page is actually held.
The computers holding the information that is thus made available are in the main running special Web "server" software. Many companies, academic institutions, and now individuals are running such servers, so as to provide information to the Web, and hence to computer users all over the world.
In fact Web browsers can also connect to and fetch information from computers that are running various other older sorts of servers, such as servers that provide simple file transfers, and so-called "gopher" servers. (The Gopher System is in some ways a precursor to the Web.) A very important point about the Web is that it integrates these various forms of networked information transfer, which users would otherwise have to deal with separately using a number of different programs and techniques. This relieves users from much needless complexity, and enables WWW to take advantage of a whole variety of pre-existing online information sources, knitting them together into what is truly a "web of information".
Nowadays many Internet access providers offer space on a computer running a Web server, that is continuously connected to the Internet, as part of their service to their customers. This enables individuals to make their own information available on the Web without themselves needing to run a server or to have their personal computers remain connected to the Internet. The Web has thus become a very effective instantaneous publishing medium, used by individuals as well as organizations, whose use is growing at an amazing rate.
It is in fact an entirely new publishing medium - one that is providing a host of new opportunities (and challenges) to existing publishing channels. For example, electronic editions of over two hundred newspapers and magazines (from the Financial Times to Time Out) are now being published via the Web, and many societies and organizations (ranging from professional societies such as the co-sponsor of this journal, the British Computer Society, to the Fulham Football Club) are using the Web as a means of providing information to their members and/or the public at large.
But the information flow is not just one way. The latest browsers provide very easy to use (essentially "form-filling") interfaces by means of which users can communicate information (e.g. search requests, orders for products, etc.) to a server. By such means a number of major retailers are already providing home-shopping over the Internet. (There are now facilities for automatically encrypting such communications so that users can, for example, safely provide details of their credit cards in making purchases.) Moreover, Web browsers typically have integrated into them facilities for handling email (clicking on someone's email address will open a window into which one can type a message to be sent to this person) and for reading and responding to Usenet newsgroups (the Internet equivalent to FidoNet bulletin boards or "echoes") - so providing a single all-embracing and very user-friendly communications environment.
Much use of the World Wide Web, and of the underlying Internet, is undertaken by commercial organizations. But the original Internet tradition of making as much information as possible freely available, and of informal cooperative development of new free facilities, remains very much alive. Thus, complementing the hundreds of academic and public libraries that provide free online access via the Internet to their library catalogues, there are now a large number of free subject-specific "virtual libraries" on the Internet. These are Web-based collections of information relevant to particular subjects (ranging from Meteorology to Music, from Finance to Fish, etc.), and thus provide organized and simple means of finding out, often in very considerable depth, about a given subject. There are also a number of "search engines". These are free services that allow automatic searching of the Web to find what information is available on a given topic - such facilities are of course very necessary given the huge and rapid growth of the Web.
It is in this Internet tradition that a group of us in various parts of the British Isles (who, incidentally, have yet to meet each other face to face) have been working to make available what we hope will soon become a large amount of information of relevance to British and Irish genealogy and family history on the World Wide Web. In this article we describe how we are doing this, and illustrate the range of information that we have already made available in or via our virtual library of UK & Ireland genealogy. (Much use of the Web is already being made by genealogists, particularly in the USA. A number of people have already "published" their family trees this way, and there are even experiments on linking such trees together into a single large Web document. However we are not aware of anything quite like the Genealogy Information Service that is the subject of this paper.)
Structure of the UK&I Information Service
Work on the UK&I Genealogy Information Service (known as the GENUKI System for short) started in January 1995 in preparation for the launch of the Usenet newsgroup soc.genealogy.uk+ireland and its associated mailing list GENUKI-L (which in fact became operational in late June). The GENUKI system already involves computers in Belfast, Colchester, Manchester, Newcastle, Oxford and St Andrews, and provides links to many others, most recently one in Alabama. However, as indicated above, from the users' point of view the fact that all these separate computers are involved is irrelevant. Rather, the users will mainly be aware of the logical structuring of the information held in the set of computers.
The method of logical structuring that we have adopted is not one that we have arbitrarily invented for ourselves. Rather it is based closely on the method that has been developed and used by the Family History Library of The Church of Jesus Christ of Latter-day Saints in Salt Lake City. This, of course, is by far the largest genealogy library in existence and one that many genealogists all over the world are familiar with through their use of the microfiche and the CD-ROM copies of the library's catalogue at LDS Family History Centres, and of the excellent Research Guides published by the Library.
The principal means of structuring used in our Information Service is therefore by means of a four-level hierarchy corresponding to locality. The top level corresponds to the UK & Ireland as a whole, while the next level consists of England, Ireland, Scotland, Wales, the Channel Islands and the Isle of Man. This choice is exactly that made by the Family History Library, and has been motivated by considerations of what major archives exist, and how various important sets of official records are organized, not by any political considerations. (Thus Ireland is the term used to cover the Republic of Ireland and Northern Ireland - since their official records and their genealogical traditions are inextricably mixed together.)
The third level of the hierarchy corresponds to counties (equivalently the separate islands comprising the Channel Islands), the fourth level corresponds to towns, parishes, etc., within such counties. (As with the Family History Library catalogue, we have chosen not to introduce the additional locality level that would be required in places if it were desired to represent groups of certain parishes into towns.)
At each of the four levels information is organized by subject. We have chosen to restrict ourselves to the set of subject headings that the Family History Library catalogue uses for its UK & Ireland information. However, we introduce such subject headings at each level only as the need arises, since many of these headings may never be needed, especially at the lower levels of the hierarchy.
Over sixty such subject headings are defined, of which the main ones used so far in the GENUKI System are:
Archives and Libraries
Emigration and Immigration
The structuring scheme involves recording details about a given information source, under the appropriate subject heading, at the level that corresponds to the localities that this information relates to. Thus information about the parish registers of Clovelly in Devon is given under UK & Ireland/England/Devon/Clovelly/Church Records, while information about census records relating not just to England but to England, Scotland and Wales, say, is be found under UK & Ireland/Census. However, taking advantage of WWW's hypertext facilities, since this set of census records is actually organized by county, we have also provided a reference to, for example, the Devon section of the census records under UK & Ireland/England/Devon/Census.
At present most effort has gone into the provision of information for the higher level localities. For example, in UK & Ireland/Archives and Libraries/Public Record Office users will find the complete text of more than seventy PRO Information Leaflets, while UK & Ireland/Archives and Libraries/LDS Family History Centres provides, amongst other information, a complete address list for all the LDS Family History Centres in the UK & Ireland. Other information at this locality level includes a detailed listing of all the Gibson Guides currently in print, the text and diagrams (map and floor plans) in the Society of Genealogist's leaflet on how to use their library, a number of information leaflets from the Royal Commission on Historic Manuscripts, etc.
At the level of the individual countries we have, for example, details of St. Catherine's House under UK & Ireland/England/Civil Registration, and of the Scottish Record Office and the New Register House and their holdings and facilities under appropriate subject headings under UK & Ireland/Scotland.
Each of the county pages provides access to the relevant section of the Marriage Witness Index that has been assembled by Ted Wildy of New Zealand. (Incidentally, these Marriage Witness Index files are held in a computer in the USA.) Some county pages have been developed much further than others, and for example already contain information such as: (i) details of activities, available publications, etc., of the relevant Family History Society, (ii) the text of information leaflets detailing the holdings of and services provided by local archives and libraries, (iii) contents listings of relevant local publications, (iv) lists of recommended published histories, bibliographies, etc.
Comparatively few pages exist so far at the town/parish level, but those that do sometimes include information such as: (i) complete monumental inscription listings or parish register transcriptions (e.g. for a number of parishes in County Durham, Essex, Glamorgan, Northumberland, and Yorkshire), (ii) details of where the extant original or microfilmed parish registers can be found, (iii) references to published local histories, etc.
Implementation of the GENUKI system
What might be termed the "first page", or "front door", of the GENUKI System is shown in Fig. 1. Pointing at and "clicking" on the underlined phrase "UK & Ireland" will result in the user being presented on his or her screen with a new page, the top part of which is shown in Fig. 2. A few more such actions and the user will be able to reach the page listing the PRO Information Leaflets that are available online. By the use of the scroll bar, the user can reach the part of this page that is shown in Fig. 3. (All these figures have been taken directly from a computer screen and show, albeit only in black and white, exactly what the user of a Netscape browser would see.)
The principal server is at the Manchester Computing Centre at the University of Manchester and is maintained by Phil Stringer. What this in fact means is that a number of files of information, including that corresponding to the front page of the service (Fig. 1), have been placed in a directory that is accessible to a Web server program that is running continuously on one of the computers at Manchester. This directory happens to have been given the name "genuki". As a result the pages are made visible to anyone who uses a browser program and tells it to start obtaining its information from the address "http://www.genuki.org.uk/".
This address is in fact what is called a URL (standing for "Uniform Resource Locator"). Such URLs are central to the operation of the World Wide Web. They consist of a prefix that in effect describes the type of information that is being sought, and then an address that has world-wide validity, down to the level of the particular file or file directory. Within the above URL "midas.ac.uk" is the Internet address of the particular machine at Manchester, and the "http" prefix identifies the networking protocol (called the HyperText Transmission Protocol) that is used by the WWW browsers and servers in order to communicate with each other. The underlined links that one sees on Web pages in fact are represented in the computer using such URLs.
Files that are requested using a URL starting with "http" are expected to be coded using HTML (Hyper-Text Mark-up Language). Such an encoding adds annotations to ordinary ASCII files that indicate how the file is to be formatted for display (e.g. which phrases are headers, what words are to be italicized) and what URL each link word or phrase represents. There are a number of ways such files can be prepared, ranging from direct editing in of such annotations, to automatic conversion of documents from, for example, Microsoft WORD format to HTML format.
It is also however possible to display documents that are simply in ASCII. It has been decided to do this, for example, for most of the PRO leaflets - the texts of which were obtained by automatic scanning of the paper versions. (There are two reasons for this decision: (i) it was wished to make these texts available also by other networking means, such as FidoNet, and (ii) the effort involved in adding the formatting annotations seems hardly worthwhile given the form of the original leaflets.). However, those leaflets which include figures are, gradually, being made available in HTML form.
There is in fact much more that could be said about HTML, and also about how, through its use, pictures and even videos and soundtracks can be incorporated in pages and thus made available via the Web, but such matters go beyond the intended scope of this paper.
The files in the genuki directory at Manchester include the HTML version of the front page (Fig. 1), and the UK & Ireland page (Fig. 2). However the information shown in Fig. 3 is part of a file held at Oxford, where all the PRO leaflets happen to be kept. A number of sections of the locality hierarchy are held in directories (for convenience in general also called "genuki") on different machines. For example, the Scotland pages are currently held in a computer in St. Andrews, the Devon and Carmarthenshire pages in a computer in Newcastle upon Tyne. (These particular file placements could well change in the future - an eventuality that we have provided for by use of certain coding and addressing conventions. However, these need not concern ordinary users of the Information Service, and we will not go into details of them here.)
We have developed presentation standards for the various different types of page, governing the use and placement of headings for example, in order to try to ensure that the Information Service as a whole has a reasonably consistent "look and feel" - our aim being to help users to find their way around the service and, for example, readily locate similar types of information about different geographical areas.
Country, county and town/parish pages all start with two small pictures (so-called "buttons") - see Figs. 1 and 2. The first of these buttons, the "Up" button, if selected by the user and clicked, will cause the current page to be replaced on his/her screen by the page at the next higher locality. For example, clicking the Up button on the Lancashire page will take the user back to the England page. The second button will instead take the user to a Table of Contents page (part of which is illustrated in Fig. 4). This page is composed almost entirely of links, so that it can be used to find and then, by clicking on the appropriate link, go directly to any of the other pages of the Information Service. At present this Table of Contents is prepared and maintained manually, but we have hopes of automating its generation, and also of providing a detailed automatically-generated alphabetical index to the contents of the Service - something that will be increasingly needed as the Service grows.
Aside from the Table of Contents page, our policy is to provide actual information on each page, and to avoid having pages whose sole purpose is to provide links to other pages. Another policy is to ensure that the general contents of each page are evident from the text that is likely to be visible on a standard-sized monitor when the page is opened by a user (i.e. text that can be seen without scrolling the page). It is perhaps worth mentioning how these policies have influenced the layout conventions we use for country and county pages. (The problem with these pages is that one of their major functions is to provide links to a possibly large set of constituent counties or towns/parishes.)
Each well-developed country or county page contains, following the initial buttons, some appropriate introductory text, then the list of links to subsidiary localities, and then the set of headed paragraphs on subjects that relate to the country or county as a whole. The list of links, and the set of subject headers, are each alphabetically ordered. However, only the first part of the list of links is likely to be visible without scrolling, so the list is preceded by a prominent link that can be used to jump directly to the set of subject paragraphs. (Each subject paragraph may well include links to further information, but is intended to be informative in itself.)
Contents of the Information Service
Evidently, the GENUKI System is intended just for genealogy-related information, all of which (other than a minimal amount providing links to the rest of the genealogical world) is of relevance to the UK & Ireland. There is no obvious definition of the term "genealogy-related". The rule of thumb we use is that if the information, or something like it, is listed in the LDS Family History Library Catalogue then it is appropriate for the GENUKI System .
As befits an Information Service that is made freely available to all, we try to ensure that all copyright restrictions are properly adhered to. Information that is known to be subject to such restrictions is not incorporated in the GENUKI System without prior clearance from the copyright owners. When they so request, we include appropriate statements about ownership, and about any usage restrictions.
In general we expect most of the information held by the GENUKI System to be in the form of text, either in HTML or ASCII form. Graphics are acceptable, providing they are worth the storage space they consume (a judgement that can be made independently for each server). However, our practice is to provide warnings to readers if a link is to a large graphic that might take quite a while to download (e.g. by indicating its size in kbytes).
Finally, it is worth reiterating that the GENUKI System is we hope as valuable for the links it provides to other information elsewhere, as for the information that has been specially gathered and placed on one of our server machines. The structuring scheme described earlier will, we trust, enable us to use all these links and all this information to provide a virtual library for UK & Ireland Genealogy that is not only very extensive but also very usable.
Evidently we would like all the information in or accessible via our Information Service to be accurate and up to date. In practice, we will often have to leave the judgement on such issues to the readers - information sources and dates are therefore indicated wherever appropriate and possible.
For information items that are particularly likely to need updating, we try to store just one copy of such information, ideally in a separate page, and if necessary have multiple links to that page. For example, society contact addresses are ideally given just once, in a page that also gives general information about the society - a page that we hope the society or its nominee will take responsibility for updating. In practice, it is inconvenient to have a large number of very small pages of information, so some duplication of information is unavoidable.
Usage of the Information Service
An analysis of recent usage of the Service covering a 41 day period indicated that it, or more exactly its front page, was accessed roughly 400 times per day on average. This does not count accesses made via proxy servers - such as those used by Compuserve and Americal Online, each of which made about 6000 accesses to the various UKIGIS pages hosted on Midas, and will have serviced multiple page requests from their customers using local copies. (The actual number of different users of such proxy servers, or the amount of use they are making UKIGIS, cannot be determined.)
Accesses that we do know of to the various UKIGIS pages hosted on Midas were made from about 50 different countries in all. Over 40% can be identified as coming from the USA, and 6-7% each from Australia, Canada and the UK. Even where proxy servers are not involved we cannot identify the sources of access down to individuals, just to machines, some of which may have many users. However the number of different machines from which accesses are known to have been made during this period is about 3,500 (1654 in the USA, 232 in Australia, 253 in Canada, and 489 in the UK).
As we have sought to explain, the GENUKI System has been designed to facilitate its growth. We thus welcome the involvement of other people and organizations who have genealogical information that they are willing to make freely available to the public via WWW, since it is only by such involvement that the UK & Ireland Information Service will grow to achieve its full potential. For example, we would be happy to augment the address details we already provide on many family history societies with further information about their activities, the publications they have for sale, membership application forms, etc. Similarly, we would welcome information about upcoming (non-commercial) events. It is preferable if such information is provided to us via email, or at least on floppy disk, so that scanning or retyping can be avoided. (Such disks should be posted to Phil Stringer, 40 Broomfields, Denton, Manchester M34 3TH, UK.)
We hope before long to devolve the task of maintaining information about many more of the counties and topics covered in the Information Service to people and organizations with more interest in, and knowledge of, the particular area or topic than any of the present authors possess. This in fact is already happening - since the original brief paper about the Information Service that appeared in the June 1995 issue of Family Tree Magazine was prepared we have received and taken up a number of such offers of assistance.
In some cases the assistance is in contacting potential sources of information; in others it is with the task of preparing pages - either for use on one of the existing servers, or by making the information available as part of the UK & Ireland Information Service through a WWW/Internet link to some other computer on which the information is normally kept.
As such growth occurs there will be a need to provide additional indexing facilities. This task should be facilitated by the control we are exercising over the use of topic terms. However, in addition we are hoping to find some means whereby we can provide a simple enquiry facility for identifying the county or counties containing a town or parish of a given name - in other words a rudimentary equivalent of the US Geographic Name Server.
In the long term we would like to see various societies and other official bodies concerned with genealogy here in the British isles begin to take both active advantage of the Internet and an active role in supporting and guiding the future development of the Information Service. However we believe that what exists now is already of practical value, and would urge genealogists, particularly those that have no experience of computer networking, to seek a demonstration of the system from a colleague, or perhaps a fellow society member, with the necessary computer and networking facilities.
To Access the GENUKI System
Most Internet access schemes now include the provision to their customers of Web browser software such as Mosaic or Netscape. To access the GENUKI System all that is then necessary is to launch such a browser, giving it the starting address:
There are no registration procedures to follow, or any other formalities to observe.
However, if anyone wishes first to make enquiries via email, the address that they should send their email to (that of Phil Stringer, who is in overall charge of the GENUKI System) is :
Fig. 1: The Front Page
Fig. 2: The start of the UK & Ireland Page
Fig. 3: Part of the PRO Leaflets Page
Fig. 4: Part of the Table of Contents Page