Posts Tagged ‘specs2’
Playing with Scala Parser Combinator
Notice
The work in this article is in progress. But you are free to read it in its current state and free to post comments. If you want to see the change set, go to the bottom of the page.
Maybe you are of those people who gets headache when writing a parser. LL or LALR grammars, lex and yacc/bison are completely unknown or are some sort of old phantoms from the past, that you are trying to forget since you have leaved the university. You cannot imagine that creating a parser can be as easy as 1-2-3. But, did you take a look at parser combinators?
A really simple parser
Parser combinators comes directly from functional programming to help you create complex parsers in a declarative way. From the Scala point of view, it looks like writing almost directly an EBNF grammar.
Here, we start with a really simple parser that recognizes the string "hello"
only.
import scala.util.parser.combinator.RegexParsers object ReallySimpleParser1 extends RegexParsers { def hello = "hello" }
First, notice that your parser combinator must be declared has an object
(a kind of mix between a singleton and a purely static class). It has to extends one of the object derived from Parsers
. RegexParsers
is one of them. JavaTokenParsers
is another one, with some predefine parsers to recognize Java-compliant identifiers, numbers, strings, etc.
Second, you can see that the pattern to recognize is declared through the method hello
. The method hello
is a Parser
and more exactly a Parser[String]
. A Parser[+T]
is a function that takes an input and returns a ParseResult[T]
. A ParseResult
is a class that has three children:
Success
: the parser has recognized the input. The instance ofSuccess
contains the result and the remaining input.Failure
: the parser has not recognized the input and will back-track. The instance contains an error message and the remaining input.Error
: something fatal occurred. The instance contains an error message and the remaining input. No back-track is proceeded.
Usually, to use your parser combinator, you have to call the method parseAll
. It takes a Parser
(like our method hello
above) and a char sequence as parameters. parseAll
returns a ParseResult
.
Below are two tests as it would appear in tests based on specs2. They show the use of our really simple parser:
import org.specs2.mutable._ import org.specs2.matcher.ParserMatchers class ReallySimpleParser1Test extends Specification with ParserMatchers { val parsers = ReallySimpleParser1 import ReallySimpleParser1.hello "hello" should { "succeed to recognize 'hello'" { hello must succeedOn("hello").withResult("hello") } "fail to recognize 'world'" { hello must failOn("world") } }
Parse numbers
RegexParsers
is used for parsers based on regular expressions. The previous example does not show this.
Below is an example that recognizes numbers. It uses a regular expression to do so:
object NumberParser extends RegexParsers { def number: Parser[Int] = """-?\d""".r ^^ { _.toInt } }
The number parser is more exactly defined in the method number
with the regular expression -?\d+
where -?
represents the optional sign, \d
represents a digit, and thus \d+
a series of at least one digit. The regular expression is followed by a function. The operation ^^
says that if the previous pattern is recognized then apply the following function. The function have to take a string in parameter, representing the recognized sequence, and can return a value of any kind (limited to what has been declared for the method). Otherwise, the method that defines the parser returns implicitly a Parser[String]
.
number must succeedOn("42").withResult(42) number must succeedOn("-24").withResult(-24) number must failOn("hello")
Yet another simple example to parse a sequence
object ReallySimpleParser2 extends RegexParsers { def sentence = hello ~ world def hello = "hello" def world = "world" }
a ~ b
parse a sequence made ofa
thenb
a | b
introduce an alternative parser that parsesa
orb
a?
introduce an optional parsera*
introduce on optional and repeatable parsera+
introduce a repeatable parsera ~> b
like~
but ignore the left member (a
)a <~ b
like~
but ignore the right member (b
)
Parse an arithmetic expression and evaluate it
Remember my previous article on Scala? The one talking about pattern matching? Do you remember that it ends with a way to evaluate arithmetic expressions based on a kind of AST? Here I present the missing part of this article: the parse of an arithmetic expression to generate the AST.
sealed abstract class ExpressionSymbol case class Const(value: Int) extends ExpressionSymbol case object X extends ExpressionSymbol case class Add(x: ExpressionSymbol, y: ExpressionSymbol) extends ExpressionSymbol case class Mul(x: ExpressionSymbol, y: ExpressionSymbol) extends ExpressionSymbol
object ExpressionParser extends RegexParsers { def expression = operation1 def operation1: Parser[ExpressionSymbol] = operation2 ~ rep("+" ~ operation2) ^^ { case op ~ list => list.foldLeft(op) { case (x, "+" ~ y) => Add(x, y) } } def operation2: Parser[ExpressionSymbol] = operand ~ rep("*" ~ operand) ^^ { case op ~ list => list.foldLeft(op) { case (x, "*" ~ y) => Mul(x, y) } } def operand: Parser[ExpressionSymbol] = (constant | variable) def variable: Parser[ExpressionSymbol] = "x" ^^ { _ => X } def constant: Parser[ExpressionSymbol] = """-?\d+""".r ^^ { s => Const(s.toInt) } def apply(input: String): Option[ExpressionSymbol] = parseAll(expression, input) match { case Success(result, _) => Some(result) case NoSuccess(_, _) => None } }
There is a method apply
that aims to simplify the use of the parser. It only takes a string in parameter. It parses it and returns an Option
that indicates if the parse has succeeded or failed.
Changeset
- [2012-08-25] initial publication
- [2012-09-07] added more explanation for the first example + show longer version for the first test