Test and Expand Regular Expressions

example:

regExp:

String: Match String to regExp  Partial Match  Generate Sentences  Clear Case Sensitive

Results:

What are “regular expressions”?

A regular expression lets you build patterns using a set of special characters; these patterns can then be compared (e.g.) with a string of text input by a user. Depending on whether or not there's a match, appropriate action can be taken, and appropriate program code executed.

In CAL (Computer Aided Learning), regular expressions can be useful to match a student's answer to pre-determined patterns, in order to trigger appropriate feedback. Regular expressions are regularly (!) discussed on the WebCT Forum, e.g. see Short answer with punctuation >>.

For a comprehensive presentation of regular expressions, see these links:

http://www.regular-expressions.info/ >>

http://gnosis.cx/publish/programming/regular_expressions.html >>

Mastering Regular Expressions >>

Matching a String to a regExp

If you are not familiar with regExps, the best way to start is to use one of the examples provided in the drop down list.

Select [bc]at from the list (first item). The regular expression is copied to the regExp text area, and the pre-determined character string Batman's cat ate the bat is automatically copied to the String area. Click on the Match String to regExp button and watch the result in the Results area. The regExp pattern looked for a match for either b or c followed immediately by at. Because the regExp search has been programmed as "global", it finds two results in the submitted String: cat and bat.

A regExp search can be done as case sensitive or insensitive. The default in this program is Case Sensitive. This explains why, in the previous example, [bc]at won't find a match for Bat in Batman. Uncheck the Case Sensitive box and try again...

Please note, however, that many programs which accept regular expressions input (e.g. WebCT tests) will not do a global match. This means that these programs will stop the search after the first match is met. In the case of my example above, the [bc]at regexp would only match cat and then it would stop looking. This the way Henk Schotel's original program Testaregex >> works.

In the second regExp listed a (2|two)-year-old girl, we will be searching the string for either 2 or two. The pre-determined string a two-year-old girl gets matched, and so would a 2-year-old girl.

To understand the other examples and to write your own regular expressions, go to the links mentioned above.

Top of page

Generating Sentences from a regExp

In some CAL programs, when an open-ended or semi-open-ended question is asked, it can be desirable to match the student's answer against a set of possible correct answers. This can be achieved with regular expressions, where the pipe character | is used to match multiple options. We had an example of this in the a (2|two)-year-old girl expression above. By combining parenthesized multiple options and even nesting parentheses within parentheses, it is possible to build regular expressions which will match a fairly large number of "accepted answers".

However, when I wanted to achieve the seemingly trivial task of generating "acceptable sentences" from "multiple-option regular expressions", I found out that the problem did not seem to have been often addressed (on the Internet user groups). I did find one solution, but it was programmed in Perl, and did not take nested parentheses into account (see Credits). That's how I came to produce this modest Expand a regular expression programme.

From the example list, select this item: The little (black |white |)cat was (sitting|lying) on the (red|blue|old) mat. Then click on the Generate Sentences from regExp button... et voilà!

If you want to write your own regular expressions from which this program will accept to generate all acceptable sentences, here are some guidelines.

Start with plain parentheses; expressions with nested parentheses can quickly get out of hand. Please note, however, that the program will test that your parentheses are correctly balanced and will display a warning if they're not.

You can have 2 sorts of options: compulsory and optional (!). The little (black|white) cat etc. requires cat to be qualified with either black or white. But The little (black |white |)cat etc. accepts : The little black cat, The little white cat and The little cat. The trick is to put an "empty" pipe | inside your parentheses. Note that The little (black |white |)cat and The little (|black |white )cat are equivalent regExps. The only difference you will notice is the order that the generated sentences will be listed in the program.

Using parentheses

The pipe character | is used to build regular expressions that have a number of possible variants. The expression blue|red|white is a perfectly valid regexp which will match the presence of either blue, red or white in the input String. However, my program will not be able to generate alternative sentences from that regexp unless you enclose the variants in parentheses (or it will not properly detect the non-parenthesized pipe characters). Actually, although it is not necessary, it is a good habit to enclose variants in parentheses when writing regular expressions anyway.

examples
regExp: Results
blue|white|red Cannot generate sentences ...
(blue|red|white) blue
red
white
One colo(u|)r is blue|red|white\. One colour is blue|red|white.
One color is blue|red|white.
One colo(u|)r is (blue|red|white)\. One colour is blue.
One colour is red.
One colour is white.
One color is blue.
One color is red.
One color is white.

Be careful where you put spaces!

If all options within a set of parentheses are obligatory, then your parentheses will be followed with a space (or some punctuation sign). But if one parenthesized option is left blank, i.e. if your set of options is "optional" (as defined above), then you should type that space after each option except the blank one, and you should not type a space after the closing parenthesis.

examples
Correct The little (black|white) dog was sitting on the mat\.
The little (black |white |)dog was sitting on the mat\.
The little (|black |white )dog was sitting on the mat\.
Not Correct The little (black|white|) dog was sitting on the mat.

As it stands, the program should accept any level of parentheses nesting (but please let me know of any bugs/limits/etc.). Because the Generation routine is meant to generate all possible sentences from a regExp, the set of special/meta-characters in such a regExp is restricted to: parentheses (), the pipe | and the ? characters, the square brackets [] and the escape sign \.

Using square brackets to specify a range of characters

[bcr]cat will match bat, cat or rat
b[aeiou]t will match bat, bet, bit, bot or but
[F-H]at will match Fat, Gat or Hat
1[1-4] will match 11, 12, 13 or 14

My program will accept to generate "sentences" from regular expressions containing ranges of characters. You should be cautious with square brackets, however, as such expressions can generate a very large quantity of alternative "sentences", with the result that the program might come to a halt.

Using the ? metacharacter

Any character or square-bracketed range of characters can be followed by the ? metacharacter to make it optional. The ? metacharacter is often equivalent to an empty option in a (a|b|) expression, as seen in the examples below.

colou?r and colo(u|)r will both match/generate color or colour

the (little |large )?cat and the (|little |large )cat will both generate the cat, the little cat and the large cat

Escaping characters

If you want the String to be matched to contain some characters which are part of the regular expressions syntax, you will have to "escape" them, i.e. to type a backslash first. For instance, if you want a full stop to be matched at the end of a "sentence" in the input String, you'll have to type \. (and not just a ., which would be interpreted as : any character by the regexp matching).

Here is a list of these special characters which need to be escaped.

^$()[]{}*.+?|\

Because the backslash is a special character itself, it needs to be escaped. If you want a \ to be matched in your input String, you'll have to type \\ in the regexp.

Final warning

Some of the metacharacters used in regular expressions are wildcards, matching a number of character strings which can range from many to ... infinite.

For example, the expression .* will match absolutely anything.

It is easy to understand that my string generator will not handle generation from such regular expressions, the output being potentially infinite.

Top of page

Partial Match

In CAL programs when a fairly long answer is expected from the student, it may be advisable to try to find out how much of their input is correct. For instance, if the answer "The black cat sat on the mat." is expected, and the student's answer is "The black cat ate the mat", it is better to display the "good so far" string "The black cat " than simply reject the whole answer as wrong. This is where the Partial Match part of this program can help.

To test it, select The little (black |white |)cat was (sitting|lying) on the (red|blue) mat\. from the example drop down list. In the String area, delete red and replace with white. Clicking on the Match String to regExp button returns a No Match message. Clicking on Partial Match will display the "so far so good" string.

Because this part of the program is based on the generation of all acceptable sentences, it tries to match the "student's answer" (String) with all those sentences and offers the sentence closest to that String.

A word of warning.

The Partial Match part of the program is not really useful as it stands, because it does not work the way a standard regular expression works. However, if you do programing in JavaScript™ yourself, you are welcome to use my routines, e.g. to improve the input/error analysis in your own or existing CAL software (such as Hot Potatoes™). If you want a copy of my JS program, use the contact address.

Top of page

Credits

The text appearing in the regExp, String and Results text areas is formatted as monospace and uses Ludida console (if installed on your system, more legible than standard Courier) or Courier.