GENUKI Maintainers' Pages
Version 1.15
How to Direct the Spider
Meta Statements
There will always be a few pages that need to be treated in a special way by the spider, and for which we don't want specific errors to be reported. For HTML files this is achieved by the use of one or more <meta> statements in the HTML code for each page that needs to be treated specially. The <meta> statements should be placed in the header section of the HTML page, and the name "genuki" has been chosen for use in all GENUKI meta statements.
The syntax of a GENUKI meta statement is as follows:
<meta name="genuki" content="genuki-directives">
where genuki-directives are one or more directives which instruct the spider to carry out the specified action with respect to the page.
The spider treats the GENUKI meta statement and its attributes in a case insensitive manner. The directives instructing the spider appear in the content attribute of the meta statement. If more than one directive is required, then the directives can be separated by spaces in the content attribute, or multiple meta statements can be used.
The GENUKI directives that the spider recognises are:
- logo=n There is no GENUKI logo on this page, but don't report it as a problem.
- link_check=n Do not validate any of the links on this page. This may be of use if the page is a contents listing generated automatically. Note: use of this directive does, of course, prevent any of the links on this page from being considered as candidates by the discovery process for inclusion in the list of GENUKI web pages.
- html_check=n Do not check the html for syntax errors.
Examples of the use of GENUKI meta statements follow:
- <meta name="genuki" content="logo=n"> Do not report the absence of a GENUKI logo on this page.
- <meta name="genuki" content="link_check=n html_check=n"> Do not validate any of the links, nor check the HTML syntax, on this page.
Href Class Attributes
The finest level of control over the spider is exercised at the link level where maintainers can direct the spider to treat the link in a non-standard way.
A link is specified in HTML as an anchor, the syntax for which is fully defined so the spider uses an existing anchor attribute defined for another purpose. The attribute chosen is the class attribute. Class is normally used to specify style sheets or javascript but this misuse is unlikely to cause problems: multiple classes can be specified if the need arises.
The classes and their meanings are as follows:
- class="nlc" Use this to direct the spider not to carry out link checking on this link.
- class="ibr" Use this to direct the spider to ignore a base re-direct on this link, i.e., when the returned base address is different to that specified in the link.
Examples:
- <a href="http://a.link.that.might/fail" class="nlc">click me</a>
- <a href="http://a.link.that.might.be/redirected" class="ibr">click me</a>