Kerflyn's Blog

Well… It's a blog!

Posts Tagged ‘grammar

Playing with Scala Parser Combinator

Notice
The work in this article is in progress. But you are free to read it in its current state and free to post comments. If you want to see the change set, go to the bottom of the page.

Maybe you are of those people who gets headache when writing a parser. LL or LALR grammars, lex and yacc/bison are completely unknown or are some sort of old phantoms from the past, that you are trying to forget since you have leaved the university. You cannot imagine that creating a parser can be as easy as 1-2-3. But, did you take a look at parser combinators?

A really simple parser

Parser combinators comes directly from functional programming to help you create complex parsers in a declarative way. From the Scala point of view, it looks like writing almost directly an EBNF grammar.

Here, we start with a really simple parser that recognizes the string "hello" only.

import scala.util.parser.combinator.RegexParsers

object ReallySimpleParser1 extends RegexParsers {
  def hello = "hello"
}

First, notice that your parser combinator must be declared has an object (a kind of mix between a singleton and a purely static class). It has to extends one of the object derived from Parsers. RegexParsers is one of them. JavaTokenParsers is another one, with some predefine parsers to recognize Java-compliant identifiers, numbers, strings, etc.

Second, you can see that the pattern to recognize is declared through the method hello. The method hello is a Parser and more exactly a Parser[String]. A Parser[+T] is a function that takes an input and returns a ParseResult[T]. A ParseResult is a class that has three children:

  • Success: the parser has recognized the input. The instance of Success contains the result and the remaining input.
  • Failure: the parser has not recognized the input and will back-track. The instance contains an error message and the remaining input.
  • Error: something fatal occurred. The instance contains an error message and the remaining input. No back-track is proceeded.

Usually, to use your parser combinator, you have to call the method parseAll. It takes a Parser (like our method hello above) and a char sequence as parameters. parseAll returns a ParseResult.

Below are two tests as it would appear in tests based on specs2. They show the use of our really simple parser:

import org.specs2.mutable._
import org.specs2.matcher.ParserMatchers

class ReallySimpleParser1Test extends Specification with ParserMatchers {
  val parsers = ReallySimpleParser1

  import ReallySimpleParser1.hello

  "hello" should {
    "succeed to recognize 'hello'" {
      hello must succeedOn("hello").withResult("hello")
    }
    "fail to recognize 'world'" {
      hello must failOn("world")
    }
}

Parse numbers

RegexParsers is used for parsers based on regular expressions. The previous example does not show this.

Below is an example that recognizes numbers. It uses a regular expression to do so:

object NumberParser extends RegexParsers {
  def number: Parser[Int] = """-?\d""".r ^^ { _.toInt }
}

The number parser is more exactly defined in the method number with the regular expression -?\d+ where -? represents the optional sign, \d represents a digit, and thus \d+ a series of at least one digit. The regular expression is followed by a function. The operation ^^ says that if the previous pattern is recognized then apply the following function. The function have to take a string in parameter, representing the recognized sequence, and can return a value of any kind (limited to what has been declared for the method). Otherwise, the method that defines the parser returns implicitly a Parser[String].

number must succeedOn("42").withResult(42)
number must succeedOn("-24").withResult(-24)
number must failOn("hello")

Yet another simple example to parse a sequence

object ReallySimpleParser2 extends RegexParsers {
  def sentence = hello ~ world
  def hello = "hello"
  def world = "world"
}
  • a ~ b parse a sequence made of a then b
  • a | b introduce an alternative parser that parses a or b
  • a? introduce an optional parser
  • a* introduce on optional and repeatable parser
  • a+ introduce a repeatable parser
  • a ~> b like ~ but ignore the left member (a)
  • a <~ b like ~ but ignore the right member (b)

Parse an arithmetic expression and evaluate it

Remember my previous article on Scala? The one talking about pattern matching? Do you remember that it ends with a way to evaluate arithmetic expressions based on a kind of AST? Here I present the missing part of this article: the parse of an arithmetic expression to generate the AST.

sealed abstract class ExpressionSymbol
case class Const(value: Int) extends ExpressionSymbol
case object X extends ExpressionSymbol
case class Add(x: ExpressionSymbol, y: ExpressionSymbol) extends ExpressionSymbol
case class Mul(x: ExpressionSymbol, y: ExpressionSymbol) extends ExpressionSymbol
object ExpressionParser extends RegexParsers {
  def expression = operation1
  def operation1: Parser[ExpressionSymbol] = operation2 ~ rep("+" ~ operation2) ^^ {
    case op ~ list => list.foldLeft(op) {
      case (x, "+" ~ y) => Add(x, y)
    }
  }
  def operation2: Parser[ExpressionSymbol] = operand ~ rep("*" ~ operand) ^^ {
    case op ~ list => list.foldLeft(op) {
      case (x, "*" ~ y) => Mul(x, y)
    }
  }
  def operand: Parser[ExpressionSymbol] = (constant | variable)
  def variable: Parser[ExpressionSymbol] = "x" ^^ { _ => X }
  def constant: Parser[ExpressionSymbol] = """-?\d+""".r ^^ { s => Const(s.toInt) }

  def apply(input: String): Option[ExpressionSymbol] = parseAll(expression, input) match {
    case Success(result, _) => Some(result)
    case NoSuccess(_, _) => None
  }
}

There is a method apply that aims to simplify the use of the parser. It only takes a string in parameter. It parses it and returns an Option that indicates if the parse has succeeded or failed.

Changeset

  • [2012-08-25] initial publication
  • [2012-09-07] added more explanation for the first example + show longer version for the first test

Written by fsarradin

2012/08/25 at 20:42