GENUKI Maintainers' Pages

Version 1.17

Constructing the GENUKI Contents and Site Map Pages

Rosemary Lockie, 2 Sep 2010.

Purpose

The purpose of the software described here is to provide a Contents and Site Map page for each of the 120 counties in the GENUKI hierarchy, with an additional 5 pages for the 4 main regions, viz: England, Ireland, Scotland and Wales, together with a "blanket" or default index for the UK & Ireland as a whole. The design of the Contents and Site Map page was agreed by the GENUKI Trustees, based on a design by Colin Hinson, and used originally for the County of Yorkshire. The output of the software used to construct Contents and Site Map pages is a combination of a chart, and associated descriptive text.

Requirements

To execute the software requires a Perl interpreter, at v5.6 or above, with HTML::LinkExtor, HTML::TokeParser, and LWP::Simple installed, and a means within Perl to read the dBase files used to hold a database. In this case, XBase (v0.241), is used. This is a Perl module for reading and writing dbf files locally. Note: this is NOT DBD:Xbase, which operates as an SQL driver.

Method

The bulk of the web pages are produced by a series of Perl scripts. Each script accepts one or more (case insensitive) Chapman County Codes as parameters to define the county or counties to be processed.

Caution: if no parameters are supplied, ALL counties will be processed.

The required sequence of execution of the Perl scripts is as follows:

  1. concache.pl - this uses LWP::ConnCache ("wget") to obtain a copy of the county index, and parish index for each county, to save locally for analysing. It uses the database table GUKCONT.DBF to work out what these indexes are called. The resulting pages will be named CCCindex.html, and (if a separate parish index exists) CCCparishes.html.

    Required: This step is necessary when ANY changes to a county are needed, though depending on what the changes are known to be, one or other of the above indexes may be omitted.

    Excluded: all counties are catered for - no exclusions.

    Caution: beware of downloaded copies of pages having broken (internal) links - be careful not to perpetuate them, editing the CCCindex.html or CCCparishes.html pages by hand if necessary.

    Caution: if a particular site refuses to talk to web robots, the relevant pages will have to be downloaded with a browser, and saved with the appropriate names.

  2. contcss.pl - this generates a style sheet for each county to "draw" the boxes of the chart, using the template CCC-hier.Tpl as a base. The resulting style sheet, CCC-hier.css, becomes the style sheet for that county.

    Required: this step is necessary if there are changes to the county (or chosen) town, or any of the URLs, as the size of "box" needed for the various text items in the chart (names and URLs) may need to be recalculated.

    Excluded: excludes generation of 'YKS' and 'CHI'.

  3. contopic.pl - this analyses the county index pages (CCCindex.html) to produce an XHTML table listing Topics available for that county. The resulting file is named CCCtopics.txt, which will be incorporated into the final result by contgen.pl.

    Required: this step is necessary when topics change for a county.

    Excluded: excludes "CHI", as its index has no "hot links" to its topics, so a CHItopics.txt has been produced manually.

  4. contowns.pl - this analyses county parish pages (typically CCCparishes.html) to produce an XHTML table listing Parishes available for that county. The resulting file is named CCCtowns.txt, and likewise will be incorporated into the final result by contgen.pl. Note if CCCparishes.html does not exist, CCCindex.html will be analysed instead, in which case it looks for the heading "Towns and Parishes" within that page to know where to start finding the links.

    Required: this step is necessary if parishes are added or deleted, or if their URLs change.

    Excluded: excludes "LND", as it doesn't have a standard Towns and Parishes page. The hand-generated LNDtowns.txt uses the list of "London Parishes & Boroughs" instead. Whilst this might be catered for in a future version of the script, it was decided it was easier to produce the appropriate file by hand.

  5. contgen.pl (last stage) - this generates the CCCcontents.shtml files, using as a template the file CCCcontents.Tpl, which contains a series of tokens to be replaced with values specific to each county, and incorporating the output of contopic.pl and contowns.pl for the "Quick Links" tables.

    Excluded: none.

Caveat

The Channel Islands, Isle of Man, Greater London, and Yorkshire all require a degree of special handling when producing the output pages. Where possible, the alterations required for each case are catered for within the scripts above, but when this is not possible, that step in the page generation process will be skipped, and it will be necessary to replicate its effect "by hand".

For example, the layout of CHI and YKS Contents differ significantly from the remainder of the counties, and their style sheets had to be created separately, so if there are any major changes needed, such as a change of URL, they may require further hand-editing.

The master index.shtml, ENGContents.shtml, IRLContents.shtml, SCTContents.shtml and WALContents.shtml cannot be generated using the scripts, so likewise any adjustments will need to be done "by hand".

Description

A set of 120 individual county pages (40 for England, 1 for the Channel Isles, 32 for Ireland, 1 for Isle of Man, 33 for Scotland and 13 for Wales) provide the Contents and Site Map for each county.

The Site Map is provided as a chart containing three levels, corresponding to three of those in the GENUKI hierarchy, viz: Home Page, Region (England, Ireland, Scotland, Wales, Channel Isles and Isle of Man) and Parish (or Town). In each case, a single Parish (or Town) has been chosen to represent Level 3 - most frequently this will be the county town, or other centre of administration, but it may also be the parish which has most information associated with it, or the one often thought of as being the "centre" of the County.

As described above, the bulk of the pages can be generated by Perl scripts, but the code which renders Yorkshire and the Channel Isles charts may require hand-editing. The Isle of Man and Greater London also have special requirements.

There are 5 more pages which describe the contents of the GENUKI site more generally, without reference to a specific county, viz: the site index (index.shtml), plus the 4 main regions - ENGcontents.shtml, IRLcontents.shtml, SCTcontents.shtml, WALcontents.shtml, which also have to be generated "by hand".

This constitutes a total of 125 operative (.shtml) pages, plus the same number of cascading style sheets (.css files).

Additionally files named CHAcontents.shtml and MANcontents.shtml have been created for testing the special requirements of the Channel Islands and Isle of Man Regions; however at present this is all they are used for. They are not required in the "finished" product.

Layout

The layout of all pages is as follows:

The page is headed by the usual GENUKI navigation icons and a heading stating the County/Region followed by the header "GENUKI Contents & Site Map", followed by a brief introduction and a Site Map.

Because the pages exploit cascading style sheets, the whole page can be rendered as text. This eliminates the need for a "long" description to accompany the image, required originally to provide a description of the chart for the visually disabled user.

Each page uses a common cascading style sheet (contents.css), and three custom style sheets for the three levels of the chart. Each of the custom style sheets has associated HTML, which is used to display the contents of the chart boxes.

The First Level is displayed using style definitions contained in UKI-hier.css, and the underlying HTML is assembled by a server side include (SSI) of the file UKI-hier.txt. In combination they render the area of the diagram describing GENUKI provision for the whole of the UK.

The Second Level of the diagram is displayed using style definitions contained in BIG-hier.css, with the underlying HTML provided by a server-side include of one of the following, appropriate to whichever Region the county belongs to, viz:

  1. England: ENG-hier.txt
  2. Ireland: IRL-hier.txt
  3. Scotland: SCT-hier.txt
  4. Wales: WAL-hier.txt
  5. Channel Isles: CHA-hier.txt
  6. Isle of Man: MAN-hier.txt

Note: Channel Islands Level 2 needs to be distinguished from the "Counties" (Level 3) of the Channel Isles (CHI). "CHA" is the chosen code for the Channel Islands in some versions of the Chapman/BSC codes. "MAN" may not be the chosen code for the Isle of Man in anyone's book, but it's a convenient way to distinguish the Region "Isle of Man" from the County "IOM".

The Third Level of the diagram relates to an individual county. It will be displayed using style definitions contained in CCC-hier.css. As the content is unique for each county, there is no advantage to be gained by specifying its accompanying HTML as a server-side include, so whilst there is a separate template for its generation, it will be included as part of the resultant CCCcontents.Tpl.

The Isle of Man .css can be generated - it's the same as the others, but with the upper (County) portion removed - but the Channel Islands and Yorkshire .css and .txt have been created/edited "by hand".

The remainder of a typical CCCcontents.shtml page is assembled using a further sequence of generic server side includes, followed by a county-specific "Table of Contents" providing fast access to the more popular areas.

SSIs are as follows:

<!--#include virtual="navigation.txt" --> (Navigation and Search) Form input for GENUKI Search Engine

Note: this section omitted originally from IRL and Irish Contents pages, as the Gazetteer didn't cover Ireland. It still doesn't cover CHI, so CHIcontents.shtml may require tweaking.

<!--#include virtual="gazetteer.txt" --> (Gazetteer) Form input for Gazetteer - no special requirements

<!--#include virtual="surnames.txt" --> (Find a Surname Interest List) All counties catered for - no special requirements

<!--#include virtual="emaillist.txt" --> (Join a Mailing List) All counties catered for - no special requirements

<!--#include virtual="societies.txt" --> (Join a Society) All counties catered for - no special requirements

<!--#include virtual="copyright.txt" --> (Copyright statement and Valid HTML referer/icon) All counties catered for - no special requirements

The remainder of the page contains a table providing fast access to particular areas of each level in the structure. The first 3 columns are the same for every file, and provide a handy overview of the most relevant topics within the upper levels of the GENUKI structure. The 4th and 5th columns are county-specific. Column 4 lists topics (subject headings) which may be available on the county index page, and column 5 provides a scrollable list of parishes. Note: some counties do not have a set of parish pages, so this column will be blank. All, however should have a list of Topics.

The above two lists are generated by the Perl scripts contowns.pl and contopic.pl, from cached copies of the County and Parish indexes for each county. Local copies are obtained initially by running the Perl script concache.pl.

Special Cases

Channel Islands Isle of Man London (Towns & Parishes -> Parishes & Boroughs) Yorkshire

As stated earlier, CHAcontents.html and MANcontents.html are not for use in the "finished" product. They are however essential for testing the layout of the special cases of CHIcontents.shtml and IOMcontents.shtml, the difference being that "UKI" and "BIG" portions of the diagram (Levels 1 and 2) are hard-wired to the .html file (rather than SSI #include virtual)

YKScontents.shtml doesn't need a test-bed as it uses a standard "BIG" diagram, so there is no corresponding test file. Any changes affecting the underlying YKScontents.shtml may however need evaluation.

Whilst some of the files associated with CHI and YKS can be generated automatically, others cannot. Whilst the IOM is also a special case, associated scripts cater for this by "chopping off" the top of the diagram defined by CCC-hier.css, and excluding the corresponding definitions within IOMcontents.shtml.

Likewise, where appropriate, the Perl scripts specifically exclude CHI and YKS from the generation, so no unique files should be overwritten accidentally (famous last words?!)

Greater London (LND) has Parishes & Boroughs, rather than Towns & Parishes. This is catered for within the script which generates LNDcontents.shtml. It is also necessary to produce a list of Parishes for the "Quick Links" table by hand, since the method used to identify parishes for other counties won't work.

Counties without Parishes

The following counties had no parish pages (in 2009 when last checked).

ARL
CAI
FER
INV
KID
OXF
ROC
ROS
SHI
SLI
SUT
WIC

In addition, the LIM "Towns and Townlands" is a page external to the GENUKI/LIM site, so it would be inappropriate to cater for it within the present automation. In any case, when last checked (26 May 2009) the said link was broken.

List of Web Pages

ABDcontents.html
AGYcontents.html
ANScontents.html
ANTcontents.html
ARLcontents.html
ARMcontents.html
AYRcontents.html
BANcontents.html
BDFcontents.html
BEWcontents.html
BKMcontents.html
BREcontents.html
BRKcontents.html
BUTcontents.html
CAEcontents.html
CAIcontents.html
CAMcontents.html
CARcontents.html
CAVcontents.html
CGNcontents.html
CHIcontents.html
CHScontents.html
CLAcontents.html
CLKcontents.html
CMNcontents.html
CORcontents.html
CULcontents.html
CWLcontents.html
DBYcontents.html
DENcontents.html
DEVcontents.html
DFScontents.html
DNBcontents.html
DONcontents.html
DORcontents.html
DOWcontents.html
DURcontents.html
DUBcontents.html
ELNcontents.html
ESScontents.html
FERcontents.html
FIFcontents.html
FLNcontents.html
GALcontents.html
GLAcontents.html
GLScontents.html
HAMcontents.html
HEFcontents.html
HRTcontents.html
HUNcontents.html
INVcontents.html
IOMcontents.html
KCDcontents.html
KENcontents.html
KERcontents.html
KIDcontents.html
KIKcontents.html
KKDcontents.html
KRScontents.html
LANcontents.html
LDYcontents.html
LINcontents.html
LEIcontents.html
LETcontents.html
LEXcontents.html
LIMcontents.html
LKScontents.html
LNDcontents.html
LOGcontents.html
LOUcontents.html
MAYcontents.html
MDXcontents.html
MEAcontents.html
MERcontents.html
MGYcontents.html
MLNcontents.html
MOGcontents.html
MONcontents.html
MORcontents.html
NAIcontents.html
NFKcontents.html
NBLcontents.html
NTHcontents.html
NTTcontents.html
OFFcontents.html
OKIcontents.html
OXFcontents.html
PEEcontents.html
PEMcontents.html
PERcontents.html
RADcontents.html
RFWcontents.html
ROScontents.html
ROCcontents.html
ROXcontents.html
RUTcontents.html
SALcontents.html
SELcontents.html
SFKcontents.html
SHIcontents.html
SLIcontents.html
SOMcontents.html
SRYcontents.html
SSXcontents.html
STIcontents.html
STScontents.html
SUTcontents.html
TIPcontents.html
TYRcontents.html
WARcontents.html
WATcontents.html
WEMcontents.html
WEScontents.html
WEXcontents.html
WICcontents.html
WIGcontents.html
WILcontents.html
WLNcontents.html
WORcontents.html
YKScontents.html

Diagnostic Scripts

The following scripts were used to check results obtained from the above 4 operational scripts - in effect, comparing the links extracted against the list of links found by the GENUKI Spider - http://www.genuki.org.uk/org/PageStats/links.csv.

conlinks.pl - filters the ~/org/PageStats/links.csv file to produce a subset of links to to be used by concheck.pl, after concache.pl and contowns.pl, as a means to check that the results from contowns.pl produces meaningful URLs.
*** Diagnostic only ***

concheck.pl - verifies the results from contopic.pl are valid, and included within the ~/org/PageStats/links.csv file. Note: uses a subset of the file produced by conlinks.pl
*** Diagnostic only ***

conurls.pl - checks URLs in GUKCONT.DBF for appearance in (local copy of) ~/org/PageStats/links.csv file, thus checking validity (or otherwise) of GUKCONT links.
*** Diagnostic only ***

All the diagnostic scripts operate best with an up-to-date copy of the file ~/org/PageStats/links.csv (or its equivalent).

The Templates

CCCcontents.Tpl - The main template, used by contgen.pl to produce the main body of CCCcontents.(s)html files

CCC-hier.Tpl - Template for producing CCC-hier.css files, used by contcss.pl

CCC-text.Tpl - Template for the underlying HTML which accompanies CCC-hier.css. This is merged with CCCcontents.Tpl by contgen.pl. It is separate to CCCcontents.Tpl to enable alternatives to be merged, when custom CCC-hier.css (CHI and YKS) are needed.

The Database Table

The variables relating to each county are held in a database (in dBase format). There is a copy provided in the Contents folder - filenames GUKCONT.DBF and GUKCONT.MDX. The index file is for the convenience of editing only, and is not used for the web page generation.

The fields in this database (dBase format) are as follows:

FieldField NameTypeWidth
1ROTLogical1Yes/No allows selective generation
2PDXLogical1Whether the County has parish pages
3CTYCharacter3Chapman eg DBY
4COUNTYCharacter26Name of County eg Derbyshire
5COUNTRYCharacter2Country (England, Scotland, &c)
6TOWNCharacter24The county (or chosen major) Town
7COUNTYURLCharacter70URL of main county index page
8PARISHURLCharacter80URL of list of Parishes page
9TOWNURLCharacter70URL of Town (Field 6) index page
10WHEREURLCharacter70URL of more detailed place list
11GENEALOGYCharacter24URL of county Genealogy section
12SOCIETIESCharacter24URL of county Societies section
13GRIDREFCharacter8OS Gridref of major town
14TOWNDISTCharacter24Town used to start Gazetteer search
15MAINTAINERCharacter30Name of current Maintainer

Note: Field 15, the Maintainer's name is an aide memoire and is NOT used in the finished product.