GENUKI Maintainers' Pages
Version 1.20
Broken links, Re-directs and other problems
Background
The spider was replaced in August 2010 by a new version. The output of the old spider will remain available for a time and is referenced from pages generated by the new spider reporting mechanism.
The new spider is run at the start of each calendar month and its output is stored on a database held at genuki.org.uk. Maintainers can request that the system administrator run the spider at other than the monthly intervals. For reporting and maintenance purposes, GENUKI pages are contained in logical sections and the spider can be run against a subset, including one, of the GENUKI sections.
Analysis
A short explanation of how the spider works is available elsewhere.
Reporting
Spider reports required by maintainers are generated when requested by means of cgi scripts which read the database and create a web page containing the required data. The data in each report is therefore the most recent available at the time the report is requested.
The most recent spider report is available as a normal web page, but if a more up to date report is needed, it can be refreshed by invoking the spider cgi script.
Several different spider reports each containing a different format, and content, are available as follows:
- Overall report
- Section report
- Files report
- Problem reports:
- Spider page access problems.
- Broken links.
- Re-directs.
- Spider access timeouts.
- HTML problems.
- Missing GENUKI logo.
- Gazetteer problems.
- Church database problems.
- Spider directives.
Because the various reports are contained within web pages generated dynamically from a database they can take a noticable amount of time to complete depending on the server load at the time. Most reports include links to subordinate reports which are also dynamically generated. Returning to a superior report from a subordinate report web page by using the "back button" on a browser will therefore cause the superior report to be regenerated dynamically a second time. Users of the dynamically-generated overall spider report are therefore recommended to use the browser option to open web pages in a new window or new tab to avoid this regeneration delay. For the same reason, it might be worth saving a "web archive" copy of the spider report page(s). Alternatively, use the static web page which is a copy of the most recent dynamically-generated overall spider report.
Overall Report
The overall report is generated by invoking the spider cgi script. The resulting web page is organised by the logical sections in which GENUKI pages are contained for maintenance purposes:
Each line in the report consists of the following fields:
- Section: the name of the section being reported. In the case of counties, this is the County Name. The link in this field invokes the section cgi script to provide a dynamic report summarising the section.
- Maintainer: the name of the maintainer with responsibility for this section.
- Last checked: date the spider was last run against this section.
- Time: how long the spider took to run against this section.
- Files: total number of files scanned by the spider in this section. The link in this field invokes the section_files cgi script to provide a dynamic report summarising the files contained within the section.
- Size: total size of section files.
- Last updated: the most recent date a file was added to, or replaced in, this section.
- Problems: the number of problems detected by the spider in this section. Each field is colour-coded showing how long the problems have existed. After 3 months (90 days) the background becomes amber, and after 6 months (180 days) it becomes red. The link in this field invokes the section_problems cgi script to provide a dynamic report showing which files have problems and an indication of why.
Section Report
The section report provides a summary of the section as follows:
- Names of the maintainers of the section, the section's gazetteer entries, and the section's church database entries.
- Number of towns/parishes in the section, with a link to a list of the section's towns and parishes. The link in this field invokes the gazextract cgi script to provide a dynamic report listing the urls of the section's town and parish files. This list is obtained from those gazetteer entries for the section which have been flagged as primary entries.
- Summary of the old spider report for the section.
- Summary of the new spider report for the section. This contains the information from the section's line in the overall report (see above) presented in a different format. The links in these fields invoke the section_files cgi script to provide a dynamic report summarising the files contained within the section.
- Summary of the gazetteer report for the section listing the County town, County town location, and County centre location. The links in the latter two fields use the showmap cgi script to display a map centred on the given location.
- Summary of the church database report for the section (the flags used when generating a church database entry for the section).
Section Files Report
The section files report provides a list of the urls of the requested files. The file types requested can be one of several alternatives:
- html files, including:
- Section home pages
- Town/parish pages
- Parish map pages
- image files
- css files
- javascript files
- other files
- non-genuki files
Redirects of genuki pages are also reported here for:
- explicit redirects
- implicit redirects (different base address returned)
The section report also includes the section problems report for all problems in the section.
Section Problems Report
The section problems report provides a list of the urls of the problem files. The problem types requested can be one of several alternatives (see the headings on the columns of the Overall Report):
- Spider: links the spider is unable to resolve.
- Link: links in error. Examples are: links pointing to files which are not found at the designated location, links omitting the required trailing /, links pointing to files which timeout during reading by the spider.
- Redirect: links which result in a browser re-direct.
- Timeout: links which result in a timeout whilst trying to connect.
- HTML: pages which fail the HTML validation test.
- Logo: pages which have no detectable GENUKI logo.
- Gaz: invalid or faulty gazetteer entries.
- Church: invalid or faulty church database entries.
- Info: pages which use a spider directive.
The resulting web page can be sorted into several different orders using the sort order directives at the top of the page:
- Page: originating page name (default).
- Link: destination page name.
- Problem: problem type.
- Date: date the problem was first detected.
Correcting Errors
Maintainers are expected to deal promptly with the errors highlighted in the reports, and it is recommended that maintainers take a structured approach to dealing with and correcting the problems notified by the spider.
1 - Confirm the error
Because a timeout error, or even a redirect error, detected by the spider's analysis might only be temporary, some errors documented in the spider reports might have disappeared by the time the maintainer looks at the reports. Before spending a significant amount of time dealing with the causes of certain types of errors, particularly but not just timeouts, maintainers should verify the errors are still valid.
2 - "404 Not Found" errors
The spider starts from a position of knowledge of all the pages that constitute the complete GENUKI website and these get checked individually on each spider run. New pages added to GENUKI since the last spider run are detected during checking and, at the end of a current run, these are added to the list of pages that constitute the complete GENUKI website. These pages are checked on the next spider run.
But when web pages which are on the list of pages that constitute the complete GENUKI website have disappeared, either because they were deleted deliberately or because of a problem, the spider cannot find them. In either situation these pages appear as a "404 Not Found" error in the "Page" section of spider problems.
If the spider simply deleted the missing web page from the list of pages that constitute the complete GENUKI website, then maintainers wouldn't be alerted to pages that had gone missing because of a real problem. Therefore, these missing pages are indicated by a button marked by a red icon with an 'X' on it. Clicking this button deletes all references to the problem as well as the reference to the missing page. This action must be taken for each GENUKI web page that has been deleted deliberately. If the web page was deleted by mistake then it will need to be reloaded.
There are no authorisation checks against use of this button, and any maintainer can use any such 'delete' button. So, please don't use it on pages for which you aren't responsible.
3 - Link errors and re-directs
Maintainers should check for, and correct:
- Link errors (failed links from files in the section to other files).
- Redirects (links from files in the section which result in a browser redirect to another file).
For each re-direct error:
Check each original link in the re-direct list and see if the page that comes back contains the expected content. If not handle it in the same way as a failed link.
Try the url of the redirected page. Does that give what is wanted? If so change the link in your page to the url of the redirected page.
Either work out what the exact url of the redirected page should be and use that, or instruct the spider not to report it as a problem (add class="ibr" in the link on your page - see the advanced guide to the spider for more details).
4 - Check before uploading
It's often hard to remember to check for errors and make the necessary changes to correct them. And if your county is fairly quiet, the time between new additions and changes might become extended. The dates and colours in the spider output can be useful in reminding maintainers of just how long it's been since they conducted their regular GENUKI maintenance tasks.
Before uploading your pages to the web it's wise to check all links. If you choose to run a link checker, either on your own computer, or using a service available on the internet, there is further advice available on which products and services are available on the maintenance software page.
If you've checked all your links before uploading web pages then it's unlikely that the spider will detect any errors.
Redirects are flagged as a problem because they often indicate a broken link that has been hidden. When a web site has been restructured, it is common practice, often on Local Government web sites, to use redirects to point incoming browsers to replacement pages in order to avoid reporting a broken link to the user.
In these cases, the redirects will be removed at some future date and this will result in real broken links. So, it is wise to examine redirects and check if they represent pages that have been moved to a new location. Correcting the link by using the new location will avoid having to correct a broken link in future when the redirect is removed.
However, not all redirects fall into this category, and some redirects are genuine and have been provided for a good reason.