Pattern Matching

Regular Expressions

Recall that language processors use regular expressions to specify the set of legal tokens of a programming language. These regular expressions are used by scanners to "spell check" programs. Regular expressions have many other uses, too, and most programming languages provide library support for them. In Scala regular expressions can be represented by strings or by instances of the Regex class.

A literal string can be viewed as a regular expression that only matches itself:

scala> val dogPattern = "dog"
dogPattern: String = dog

scala> "dog".matches(dogPattern)
res0: Boolean = true

scala> "cat".matches(dogPattern)
res1: Boolean = false

We can build regular expressions from literal strings using these operators:

A?.....A is optional
A|B.....A or B
AB......A followed by B
A+.....one or more A
A*.....zero or more A
(A).... A (as a capturing group, see below)

For example:

scala> val petPattern = "dog|cat"
petPattern: String = dog|cat

scala> "dog".matches(petPattern)
res2: Boolean = true

scala> "cat".matches(petPattern)
res3: Boolean = true

scala> val barkPattern = "(woof)+"
barkPattern: String = (woof)+

scala> "woofwoofwoof".matches(barkPattern)
res4: Boolean = true

Be careful. The scope of a quantifier (+, *, and ?) is the regular expression immediately to its left:

scala> val meowPattern = "meow+"
meowPattern: String = meow+

scala> "meowmeowmeow".matches(meowPattern)
res5: Boolean = false

scala> "meowwwww".matches(meowPattern)
res6: Boolean = true

Be careful. Blank spaces are considered regular expressions that match themselves:

scala> val namePattern = "(Mr |Ms )?(Jones|Smith|Rogers)"
namePattern: String = (Mr |Ms )?(Jones|Smith|Rogers)

scala> "Mr Smith".matches(namePattern)
res7: Boolean = true

scala> "Smith".matches(namePattern)
res8: Boolean = true

scala> "Ms    Jones".matches(namePattern)
res9: Boolean = false

scala> "MsRogers".matches(namePattern)
res10: Boolean = false

Scala provides many pre-defined regular expressions:

scala> val alphaNumPattern = "[a-zA-Z0-9]+"
alphaNumPattern: String = [a-zA-Z0-9]+

scala> "Agent007".matches(alphaNumPattern)
res11: Boolean = true

Here's a more complex example:

scala> val expPattern = "[0-9]+\\s*(\\+|\\*)\\s*[0-9]+"
expPattern: String = [0-9]+\s*(\+|\*)\s*[0-9]+

scala> "23*  42".matches(expPattern)
res12: Boolean = true

scala> "999   +    999".matches(expPattern)
res13: Boolean = true

scala> "2*3".matches(expPattern)
res14: Boolean = true

In this example we need to put the escape slash (i.e., "\") in front of + and * to tell the matches method to interpret them literally, not as regular expression operators. "\s" is a pre-defined regular expression that matches a single whitespace character (i.e., tab, space, or newline). We need the additional slash because a character preceeded by a backslash is usually interpreted as a control character (e.g. \n, \t, \a).

We can avoid double backslashes by using raw strings:

scala> val expPattern2 = """[0-9]+\s*(\+|\*)\s*[0-9]+"""
expPattern2: String = [0-9]+\s*(\+|\*)\s*[0-9]+

Capturing Groups

Representing regular expressions as strings can be limiting. We can also represent regular expressions as instances of Scala's scala.util.matching.Regex class:

class Regex {
   def findAllIn(text: String): Iterator = iterator over the sequence of all matches
   def replaceAllIn(text: String, sub: String): String = result of replacing all matches by sub
   // etc.
}

We can create an instance of Regex from a string as follows:

val numPattern = "(0|[1-9][0-9]*)".r

For example, assume we want to look at each legal token in a program written in language L. Assume tokens in L consist of numbers, identifiers, and operator symbols. We can define all of the tokens using a single regular expression:

val tokenPattern = "(0|[1-9][0-9]*)|([a-zA-Z][a-zA-Z0-9]*)|(\\+|\\*|-|/)".r

Notice that we group each token category using parentheses.

The findAllIn method creates an iterator that allows us to iterate through all of the tokens in an expression:

val tokens = tokenPattern.findAllIn("12*x + 23.01 - pi")


scala>  for(next <- tokens) println(next)
12
*
x
+
23
0
1
-
pi

Here's a simple scanner: Scan.scala

Regex.replaceAllIn is useful for form letters:

var letter = "Dear NAME1, I've decided to leave you and date NAME2 instead. I hope we can be friends, NAME1."
letter = ("NAME1".r).replaceAllIn(letter, "John")
letter = ("NAME2".r).replaceAllIn(letter, "Steve")

letter //> res21: String = Dear John, I've decided to leave you and date Steve instead. I hope we can be friends, John.

Case Classes

A case class is simply a class declared using the word "case":

case class Exp(arg1: Int, op: Char, arg2: Int)

Scala automatically generates many extra features for a case class:

·       toString, equals, hashCode, and copy methods

·       a companion object with apply and unapply methods.

The apply method of a case class calls the class constructor. A constructor takes as its input the fields of an object and returns an object containing those fields. For example:

Exp(2, '+', 3) // = Exp.apply(2, '+', 3)

Unapply does the opposite. Given an object it extracts the object's fields.

Unapply is implicitly invoked by case clauses. For example:

  def execute(exp: Exp) =
     exp match
        case Exp(a1, '+', a2) => Some(a1 + a2)
        case Exp(a1, '*', a2) => Some(a1 * a2)
        case _ => None

In this function a1 is bound to exp.arg1, a2 is bound to exp.arg2, and exp.op is matched with '+' or '*'.

scala> val exp1 = Exp(3, '+', 4)
exp1: Exp = Exp(3,+,4)

scala> val exp2 = Exp(9, '*', 9)
exp2: Exp = Exp(9,*,9)

scala> execute(exp1)
res15: Option[Int] = Some(7)

scala> execute(exp2)
res16: Option[Int] = Some(81)

Pattern-Driven Programming

In data-driven programming (e.g., polymorphism), the flow of control is determined by data, not programmers. More specifically, flow is determined by the class instantiated by the data:

Employee e = new Programmer(); // subsumption
e.print(); // calls Programmer.print

Data determines the flow of control in pattern-driven programming too, the difference is that it's the patterns instantiated by the data that determines the flow. This idea is embodied in the match/case expression:

  def execute(exp: Exp) =
     exp match
        case Exp(a1, '+', a2) => Some(a1 + a2)
        case Exp(a1, '*', a2) => Some(a1 * a2)
        case _ => None

Prolog, Datalog, and Proplog interpreters use pattern-driven control. Goals are matched to appropriate facts and rules using a sophisticated pattern matching algorithm called unification.

Labs

·       Regular Expression Labs