How to Check Links
Help and Guidance 2021: Modified Page: Version 1.1
The GENUKI system contains many thousand of pages, most with links to other pages, both internally within the GENUKI system, and externally to remote sites. A large part of the reputation and credibility of GENUKI is built on the assumption that theseinks will work when selected by the end user. This How to gives you the full background on checking links. There is a quick How to Fix Links as well.
The elements of the checking process
There are several key components to the GENUKI link checking system:
- Each time a page of information is saved the system extracts all the links from the content and stores these links on a master list of links.
- The system will then automatically check all new and existing links in the system every 4 weeks. NB. In order to spread the load on the GENUKI web server, the checking of links will be done in small batches every hour, but any specific link will be rechecked every 4 weeks.
- All link errors and redirected links are stored in the link checking database, thereby allowing analysis of these broken links at any time.
- Each time a maintainer opens the edit view of a page in the system, any broken links it contains will be shown at the head of the edit view of the page.
- Each time a an edited page is saved, any new links added by a Maintainer will be submitted for checking at the next batch run.
- Each maintainer can also view a complete list of broken links on all the pages that they are responsible for.
- Each maintainer has the facility to tell the system to ignore selected links during the checking process, either on a temporary or permanent basis.
Automatic checking of links
The GENUKI system selects every hour up to 1000 new or existing links to be checked. This is done by checking the GENUKI site or the remote sites and evaluating the HTTP response codes. Any response other than a normal response is logged by the system for subsequent reporting. There are various types of links analysed by this process:
- links internal to the page being checked
- links to another page within the GENUKI system
- links to a remote site outside the GENUKI system
Information available to a GENUKI Maintainer
Editing a page
When a maintainer edits a page they will see at the head of the page a list of its broken links. This is a useful reminder if you are editing the page for other reasons.
Profile: My errors
Within your profile when you are logged in you can access the "My Errors" page from the page menu bar.
This will show a small table with a number of link errors in each section (usually County) that you maintain. Click on the number shown and you will be taken to the Broken Links report
For each broken link, the system lists:
- Section - This is the GENUKI section under which the page is maintained. This is normally a country or county.
- Node - This is the url of the page containing the broken link. If the GENUKI alias has not been set, the system will display the node number. The title of the page is also shown in brackets on the line below.
- Field Name - This is the field name containing the broken link.
- Broken Link - This is the url of the link that is broken. The visible text "fragment" containing the broken url is also shown in brackets on the line below.
- Int/Ext - some link issues are internal to GENUKI. The majority will be external
- Code - This is the Hypertext Status Code sent by the remote system when the link was checked. NB. Currently status codes of 200, 206, 302, and 304 are the only valid codes. All other codes are regarded as "broken". See below for a list of the more common error codes, and how to deal with them.
- Error - This is a description of the Hypertext Status Code explained above.
- Author - This is the name of the author of the page. This is normally the GENUKI maintainer for the page.
- Last Checked - This is the date and time that the link was last checked.
The final column on the Broken Links report contains a number of useful "operations" for the Maintainer to use:
- Edit node - this option allows the Maintainer to directly edit the node containing the broken link
- Recheck - this option submits the chosen link for re-checking at the next batch run
- Ignore link - this option allows the link to be transferred from the Broken Links report to the Ignored Links report (see section on "Ignored Links" below).
- Redirect - this option allows the Maintainer to immediately change the existing link to the redirected link, without having to manually edit the node and manually correct the link.
Thus the Broken Links report, through its more comprehensive details about each error, and provision of "useful operations", is usually more convenient to use than the error reports given at the head of relevant individual node edit screens.
The Broken Links report simply lists all the errors for a given Maintainer. The error report also has some extra facilities to enable Maintainers to focus their work on a specific area:
- Filtering - this facility enables the Maintainer to filter the list on the key fields - section, node (ie. url of page containing the broken link), field name, url of broken link, status code. The filtering options are exposed by clicking on the "Filter Items" link at the top of the page. Simply enter characters into one or more filter fields, and the system will retrieve any records containing those characters. NB. the fields are case-sensitive.
- Simultaneous actions on multiple links - it is now possible to carry out the same action on multiple links simultaneously. This is done by selecting the required rows by ticking the checkbox at the start of each line, and then choosing the required action by clicking on the "Update Options" links at the top of the page. Allowed actions currently include moving items to the Ignored List, and re-submitting items for re-checking. More information on these actions is described below.
- A pager - this is a facility shown at the bottom of the page that splits a large report into multiple chunks, and then allows the Maintainer to step forwards or backwards through multiple pages (50 items at a time). Note: Use of this facility takes you back to the unfiltered list.
- Recheck all links - this is a button at the top of the report that allows a Maintainer to submit all their current broken links for re-checking the next time the batch job is run (usually within the hour)
Correction of a broken link
There are many reasons why a link may show up on the Broken Links error report. However, there are four basic scenarios:
- Link is still correct - this scenario occurs when the link is still valid, but for some reason the remote system has responded with an error message (e.g. the remote system was undergoing maintenance). Assuming that the remote system was only temporarily unavailable, the broken link will normally disappear from the error report the next time this link is checked in the 4-week cycle. The Maintainer has the option of manually resubmitting the link for re-checking as soon as the remote system is available once again. Once a link has been resubmitted for re-checking, it will temporarily disappear from the error report until the re-checking process is complete. (It may then subsequently return to the error report if there is still a problem with the link).
- Link is no longer valid - this scenario occurs when the page being requested has been removed or changed on the remote system for whatever reason. In this case the GENUKI maintainer must either remove or correct the link manually. Once the link on the GENUKI page has been corrected, the system will submit the link for re-checking in a forthcoming batch run. NB. It will temporarily disappear from the error report until the re-checking process is complete. (It may then subsequently return to the error report if the correction is unsuccessful).
- Link has been moved - this scenario occurs when the page being requested has been permanently moved to a different part of the remote system, but the owner of the remote system has provided a "redirected" link. In this case the GENUKI maintainer must manually change the link to the redirected link (assuming the Maintainer is happy with the redirected page). Again, once the link on the GENUKI page has been corrected, the system will submit the link for re-checking in a forthcoming batch run. If the Maintainer is happy with the suggested redirected link, the simplest action is to change the existing link to the redirected link by selection the "Redirect" operation on the Broken Links report. NB. It will temporarily disappear from the error report until the re-checking process is complete. (It may then subsequently return to the error report if the redirection is unsuccessful).
- Link will never be valid - this scenario occurs when the maintainer has entered a non-existent link, usually by mistake / typing error. Again, once the link on the GENUKI page has been corrected, the system will submit the link for re-checking in a forthcoming batch run. It will temporarily disappear from the error report until the re-checking process is complete (it may then subsequently return to the error report if the correction is unsuccessful).
HTTP Status Codes
Each time a link is checked the GENUKI system sends an automatic request to the target system/website (internal or external), and the target system automatically sends a response message and status code to GENUKI. The most common codes are:
- 200 OK - This is the standard response for successful HTTP requests.
- 206 Partial Content - The remote server is has delivered only part of the content. This is also OK, as the GENUKI link checking process does not require a full page to be sent.
- 304 Not Modified - ok.
- 301 Moved Permanently - The requested link has been moved permanently to a new location. All future requests should be directed to the new given Url. The Maintainer must then decide whether or not to accepted the suggested redirection (see "Correction of a broken link" above), or to remove the link from the text completely.
- 302 Found - According to Wikipedia this code is used in different ways by different sites. However, this code is another category of re-directs (temporary or permanent). Therefore, as for status 301, the Maintainer must decide whether to accept the redirection, or to remove the link completely.
- 400 Bad Request - The server cannot or will not process the request due to something that is perceived to be a client (ie. GENUKI) error. As the GENUKI link checking process is meant to be an automatic process run in the background, it is unlikely that the Maintainer can resolve this problem. (Please contact Phil or Ken in this case).
- 403 Forbidden - The request was a valid request, but the server is refusing to respond to it - one simply-fixed cause is that the page linked to has not had its "Publish" flag set on. If the Maintainer cannot access the remote system by clicking on the link, then it is likely that the link will have to be changed manually. However, if the link works by clicking on it, but the GENUKI link checking process generates an error, it it likely the remote site has been configured to deter automated requests. The recommendation is to classify this link as a permanently ignored link (see section on "Ignored Links" below).
- 404 Not Found - The requested resource could not be found but may be available somewhere else, or again at the requested location some time in the future. The most likely solution is for the Maintainer to correct the link to one that works, when this is possible.
- 500 Internal Server Error - A generic error message, given when an unexpected condition was encountered and no more specific message is suitable. It is suggested that the Maintainer submits this for re-checking a day or two later to see if the remote server is available again.
- 503 Service Unavailable - It is suggested that the Maintainer submits this for re-checking a day or two later to see if the remote server is available again.
- 999 Name or Service not known - This error occurs when the link checking process is unable to find a valid remote site to communicate with. This link will probably have to be corrected or removed manually by the Maintainer. The error message will normally be shown as "Name or Service not known", but if the link checker determines a more specific error, then this specific error message will be shown instead.
Maintainers can mark an error as an "ignored link".
This is to enable a Maintainer to remove one or more links from their Broken Links report.
Ignored links can be classified as temporary or permanent:
- Temporarily ignored links will continue to be re-checked every 4 weeks, and will still be regarded as broken links by the system whether they actually work or not.
For example, this could be used when a target system is unavailable for a short period, but is likely to return in the near future.
Or where a reported link does work when initially checked but after being submitted for re-checking again appears on the error report.
- Permanently ignored links will no longer be re-checked by the system every 4 weeks OR appear on any error report (but see below).
For example, this could be used when a link is correctly available when clicked on by a user, but for some reason the target system returns an error (see status code 403 above) when checked by the GENUKI link checking process.
It is anticipated that the permanent option will be used sparingly in exceptional cases where the links work but persist in being returned as errors.
- A temporary v. permanent choice can be made for single or multiple items by using the "Update Options" menu
- Alternatively, using the Ignore option available against single items under Operations will by default place the entry into the temporary ignored section.
- Any temporary and permanent Ignored items that are outstanding at any time can be viewed for each county in columns J & K on the Statistics & Errors report