Recall that language processors use regular expressions to specify the set of legal tokens of a programming language. These regular expressions are used by scanners to "spell check" programs. Regular expressions have many other uses, too, and most programming languages provide library support for them. In Scala regular expressions can be represented by strings or by instances of the Regex class.
A literal string can be viewed as a regular expression that only matches itself:
scala> val dogPattern = "dog"
dogPattern: String = dog
scala> "dog".matches(dogPattern)
res0: Boolean = true
scala> "cat".matches(dogPattern)
res1: Boolean = false
We can build regular expressions from literal strings using these operators:
A?.....A is optional
A|B.....A or B
AB......A followed by B
A+.....one or more A
A*.....zero or more A
(A).... A (as a capturing group, see below)
For example:
scala> val petPattern = "dog|cat"
petPattern: String = dog|cat
scala> "dog".matches(petPattern)
res2: Boolean = true
scala> "cat".matches(petPattern)
res3: Boolean = true
scala> val barkPattern = "(woof)+"
barkPattern: String = (woof)+
scala> "woofwoofwoof".matches(barkPattern)
res4: Boolean = true
Be careful. The scope of a quantifier (+, *, and ?) is the regular expression immediately to its left:
scala> val meowPattern = "meow+"
meowPattern: String = meow+
scala> "meowmeowmeow".matches(meowPattern)
res5: Boolean = false
scala> "meowwwww".matches(meowPattern)
res6: Boolean = true
Be careful. Blank spaces are considered regular expressions that match themselves:
scala> val namePattern = "(Mr |Ms
)?(Jones|Smith|Rogers)"
namePattern: String = (Mr |Ms )?(Jones|Smith|Rogers)
scala> "Mr Smith".matches(namePattern)
res7: Boolean = true
scala> "Smith".matches(namePattern)
res8: Boolean = true
scala> "Ms
Jones".matches(namePattern)
res9: Boolean = false
scala> "MsRogers".matches(namePattern)
res10: Boolean = false
Scala provides many pre-defined regular expressions:
scala> val alphaNumPattern =
"[a-zA-Z0-9]+"
alphaNumPattern: String = [a-zA-Z0-9]+
scala> "Agent007".matches(alphaNumPattern)
res11: Boolean = true
Here's
a more complex example:
scala> val expPattern =
"[0-9]+\\s*(\\+|\\*)\\s*[0-9]+"
expPattern: String = [0-9]+\s*(\+|\*)\s*[0-9]+
scala> "23*
42".matches(expPattern)
res12: Boolean = true
scala> "999 + 999".matches(expPattern)
res13: Boolean = true
scala> "2*3".matches(expPattern)
res14: Boolean = true
In
this example we need to put the escape slash (i.e., "\") in front of
+ and * to tell the matches method to interpret them literally, not as regular
expression operators. "\s" is a pre-defined regular expression that
matches a single whitespace character (i.e., tab, space, or newline). We need
the additional slash because a character preceeded by a backslash is usually
interpreted as a control character (e.g. \n, \t, \a).
We
can avoid double backslashes by using raw strings:
scala> val expPattern2 =
"""[0-9]+\s*(\+|\*)\s*[0-9]+"""
expPattern2: String = [0-9]+\s*(\+|\*)\s*[0-9]+
Representing regular expressions as strings can be limiting. We can also represent regular expressions as instances of Scala's scala.util.matching.Regex class:
class Regex {
def findAllIn(text: String): Iterator =
iterator over the sequence of all matches
def replaceAllIn(text: String, sub:
String): String = result of replacing all matches by sub
// etc.
}
We can create an instance of Regex from a string as follows:
val numPattern = "(0|[1-9][0-9]*)".r
For example, assume we want to look at each legal token in a program written in language L. Assume tokens in L consist of numbers, identifiers, and operator symbols. We can define all of the tokens using a single regular expression:
val tokenPattern =
"(0|[1-9][0-9]*)|([a-zA-Z][a-zA-Z0-9]*)|(\\+|\\*|-|/)".r
Notice that we group each token category using parentheses.
The findAllIn method creates an iterator that allows us to iterate through all of the tokens in an expression:
val tokens = tokenPattern.findAllIn("12*x + 23.01 - pi")
scala> for(next <- tokens)
println(next)
12
*
x
+
23
0
1
-
pi
Here's a simple scanner: Scan.scala
Regex.replaceAllIn is useful for form letters:
var letter = "Dear NAME1, I've decided to leave
you and date NAME2 instead. I hope we can be friends, NAME1."
letter = ("NAME1".r).replaceAllIn(letter, "John")
letter = ("NAME2".r).replaceAllIn(letter, "Steve")
letter //> res21: String = Dear John, I've decided to leave you and date
Steve instead. I hope we can be friends, John.
A case class is simply a class declared using the word "case":
case class Exp(arg1: Int, op: Char, arg2: Int)
Scala automatically generates many extra features for a case class:
· toString, equals, hashCode, and copy methods
· a companion object with apply and unapply methods.
The apply method of a case class calls the class constructor. A constructor takes as its input the fields of an object and returns an object containing those fields. For example:
Exp(2, '+', 3) // = Exp.apply(2, '+', 3)
Unapply does the opposite. Given an object it extracts the object's fields.
Unapply is implicitly invoked by case clauses. For example:
def execute(exp: Exp) =
exp match
case Exp(a1, '+', a2) =>
Some(a1 + a2)
case Exp(a1, '*', a2) =>
Some(a1 * a2)
case _ => None
In this function a1 is bound to exp.arg1, a2 is bound to exp.arg2, and exp.op is matched with '+' or '*'.
scala> val exp1 = Exp(3, '+', 4)
exp1: Exp = Exp(3,+,4)
scala> val exp2 = Exp(9, '*', 9)
exp2: Exp = Exp(9,*,9)
scala> execute(exp1)
res15: Option[Int] = Some(7)
scala> execute(exp2)
res16: Option[Int] = Some(81)
In data-driven programming (e.g., polymorphism), the flow of control is determined by data, not programmers. More specifically, flow is determined by the class instantiated by the data:
Employee e = new Programmer(); // subsumption
e.print(); // calls Programmer.print
Data determines the flow of control in pattern-driven programming too, the difference is that it's the patterns instantiated by the data that determines the flow. This idea is embodied in the match/case expression:
def execute(exp: Exp) =
exp match
case Exp(a1, '+', a2) =>
Some(a1 + a2)
case Exp(a1, '*', a2) => Some(a1 * a2)
case _ => None
Prolog, Datalog, and Proplog interpreters use pattern-driven control. Goals are matched to appropriate facts and rules using a sophisticated pattern matching algorithm called unification.