Monday, February 22, 2010

Regular Expressions with Maxscript

The other day I was working on a script that required use of regular expressions. For those unaware, according to wikipedia:

http://en.wikipedia.org/wiki/Regular_expression

In computing, regular expressions, also referred to as regex or regexp, provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.
The following examples illustrate a few specifications that could be expressed in a regular expression:
  • The sequence of characters "car" in any context, such as "car", "cartoon", or "bicarbonate"
  • The word "car" when it appears as an isolated word
  • The word "car" when preceded by the word "blue" or "red"
  • A dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits
Regular expressions can be much more complex than these examples.

This is super useful whenever one wants to do anything beyond the basic trimming, appending etc. of strings.

Imagine that we are writing a tool to help with naming objects. According to naming conventions at your studio, we want to ensure that objects are named in the following format: assetname_partname_001
there are a few clunky ways we could attack this using standard maxscript, however regular expressions provide a powerful and flexible tool to achieve this.

Prior to Max 9, if the tool didn't exist in maxscript then you were kinda out of luck. Maxscript does not have a built in Regular Expression function. However, with the addition of DotNet in Max 9, we have the ability to use Windows .NET functions!

rx = dotNetClass "System.Text.RegularExpressions.RegEx"

pattern = "^([A-Za-z0-9]+_){2}\d{3}"

s = "robot_leftArm_001"

if (((rx.match s pattern).success) == true) then
(
  print "success"
)
else
(
  print "fail!"
)

Lets go through line by line.

First up we instantiate a copy of the .NET regex class as the variable name rx

Next we build our RegEx pattern. I'm not going to go over all the options availiable here as this post would be 100 pages long. Check this excellent site for tutorials on RegEx syntax: http://www.regular-expressions.info/index.html

Here are the options I've used in my pattern string one by one.
  1. First is the ^ character. This forces the matching to begin from the start of the string. 
  2. Next, is a section enclosed in parenthesis this means it will evaluate as a single block.
  3. Next is a section enclosed in square brackets [A-Za-z0-9] this will match any characters that are in these ranges All the alphabet characters as well as the digits 0-9.
  4. The + says "match one or many of the preceeding term"
  5. Next is the underscore character. We want each block of text to end with an underscore.
  6. The {2} option repeats everything inside the parenthesis block twice.
  7. \d is a special option that means "match any digits 0-9" we then repeat it 3 times with the {3} option

(rx.match s pattern).success)

Lastly we can use this property of the regex class to return a boolean true/false

I hope this brief intro is useful.

Thanks to Richard Anemma and Greg Boyington for useful tips!

No comments:

Post a Comment