Hide

How to use Regular Expressions in Bulk Operations

hide
Hide

          Help and Guidance 2021: New Page: Version 1.1

Hide

Introduction


If you want to do a regex (Regular Expression) search as part of a Bulk Operation you probably know something of what that entails. It enables the use of much more specific and complex search and replace strings. It allows for wildcards (ie strings that contain any character in part of the search and replace) and provides access to characters that are not usually allowed in simple search methods. 

It therefore has the potential to be very dangerous! We offer a couple of external links to give you more background or to refresh your knowledge.:

If you have an editing need that you feel could be solved by regex but are not familiar with the technique you are advised to raise the issue on the maintainers' group.

A worked example in GENUKI


For most maintainers the most likely need to use regex is to correct problem external links.  What follows is a real example that came up in May 2021.

The problem

The error report for County Armagh showed over 200 'Not Found' (404) errors for the same domain spread over several parish pages.

One of these was the existing link:

http://www.craigavonhistoricalsociety.org.uk/rev/haddendoctoratsea.html

The correct url was now shown as:

http://www.craigavonhistoricalsociety.org.uk/rev/haddendoctoratsea.php

In other words,  .html had become .php. 

The part highlighted in red is the variable (targeting different pages within that domain). Using a simple VBO would not identity each separate instance and therefore could not correct them all in one process. With regex there is a way of doing so.

The Regex Approach

This is how regex was used in this example

On the Bulk Ops page;

  • The first step in the process, after setting up county, content type etc, is to Choose Nodes and inserting "craigavonhistoricalsociety" in the Topic Content / Contains box produces a list of the 10 parish pages which contain these links.
  • Select them all and click Search & Replace
  • On the next page click on Options and then tick the box "
    • The search and replace fields contain regular expressions. Enclose the search pattern in slashes." which ONLY applies to the Search box
  • In the Search box insert 
    • /craigavonhistoricalsociety\.org\.uk\/rev\/(.*?)\.html\"/
  • In the Replace box insert
    • craigavonhistoricalsociety.org.uk/rev/$1.php"
  • Note that the highlighted part of these urls is all that needs to be changed when this example is used as a guide to your own regex processes for link correction from html to php suffixes.

The Warning Again!

Do not run a full bulk operations process using regex without first testing it on at least a single instance, maybe a couple if you're the nervous sort ! If in doubt ask.

Example prepared by Gareth Hicks