Kerflyn's Blog

Well… It's a blog!

Posts Tagged ‘Scala

Playing with Scala Parser Combinator

The work in this article is in progress. But you are free to read it in its current state and free to post comments. If you want to see the change set, go to the bottom of the page.

Maybe you are of those people who gets headache when writing a parser. LL or LALR grammars, lex and yacc/bison are completely unknown or are some sort of old phantoms from the past, that you are trying to forget since you have leaved the university. You cannot imagine that creating a parser can be as easy as 1-2-3. But, did you take a look at parser combinators?

A really simple parser

Parser combinators comes directly from functional programming to help you create complex parsers in a declarative way. From the Scala point of view, it looks like writing almost directly an EBNF grammar.

Here, we start with a really simple parser that recognizes the string "hello" only.

import scala.util.parser.combinator.RegexParsers

object ReallySimpleParser1 extends RegexParsers {
  def hello = "hello"

First, notice that your parser combinator must be declared has an object (a kind of mix between a singleton and a purely static class). It has to extends one of the object derived from Parsers. RegexParsers is one of them. JavaTokenParsers is another one, with some predefine parsers to recognize Java-compliant identifiers, numbers, strings, etc.

Second, you can see that the pattern to recognize is declared through the method hello. The method hello is a Parser and more exactly a Parser[String]. A Parser[+T] is a function that takes an input and returns a ParseResult[T]. A ParseResult is a class that has three children:

  • Success: the parser has recognized the input. The instance of Success contains the result and the remaining input.
  • Failure: the parser has not recognized the input and will back-track. The instance contains an error message and the remaining input.
  • Error: something fatal occurred. The instance contains an error message and the remaining input. No back-track is proceeded.

Usually, to use your parser combinator, you have to call the method parseAll. It takes a Parser (like our method hello above) and a char sequence as parameters. parseAll returns a ParseResult.

Below are two tests as it would appear in tests based on specs2. They show the use of our really simple parser:

import org.specs2.mutable._
import org.specs2.matcher.ParserMatchers

class ReallySimpleParser1Test extends Specification with ParserMatchers {
  val parsers = ReallySimpleParser1

  import ReallySimpleParser1.hello

  "hello" should {
    "succeed to recognize 'hello'" {
      hello must succeedOn("hello").withResult("hello")
    "fail to recognize 'world'" {
      hello must failOn("world")

Parse numbers

RegexParsers is used for parsers based on regular expressions. The previous example does not show this.

Below is an example that recognizes numbers. It uses a regular expression to do so:

object NumberParser extends RegexParsers {
  def number: Parser[Int] = """-?\d""".r ^^ { _.toInt }

The number parser is more exactly defined in the method number with the regular expression -?\d+ where -? represents the optional sign, \d represents a digit, and thus \d+ a series of at least one digit. The regular expression is followed by a function. The operation ^^ says that if the previous pattern is recognized then apply the following function. The function have to take a string in parameter, representing the recognized sequence, and can return a value of any kind (limited to what has been declared for the method). Otherwise, the method that defines the parser returns implicitly a Parser[String].

number must succeedOn("42").withResult(42)
number must succeedOn("-24").withResult(-24)
number must failOn("hello")

Yet another simple example to parse a sequence

object ReallySimpleParser2 extends RegexParsers {
  def sentence = hello ~ world
  def hello = "hello"
  def world = "world"
  • a ~ b parse a sequence made of a then b
  • a | b introduce an alternative parser that parses a or b
  • a? introduce an optional parser
  • a* introduce on optional and repeatable parser
  • a+ introduce a repeatable parser
  • a ~> b like ~ but ignore the left member (a)
  • a <~ b like ~ but ignore the right member (b)

Parse an arithmetic expression and evaluate it

Remember my previous article on Scala? The one talking about pattern matching? Do you remember that it ends with a way to evaluate arithmetic expressions based on a kind of AST? Here I present the missing part of this article: the parse of an arithmetic expression to generate the AST.

sealed abstract class ExpressionSymbol
case class Const(value: Int) extends ExpressionSymbol
case object X extends ExpressionSymbol
case class Add(x: ExpressionSymbol, y: ExpressionSymbol) extends ExpressionSymbol
case class Mul(x: ExpressionSymbol, y: ExpressionSymbol) extends ExpressionSymbol
object ExpressionParser extends RegexParsers {
  def expression = operation1
  def operation1: Parser[ExpressionSymbol] = operation2 ~ rep("+" ~ operation2) ^^ {
    case op ~ list => list.foldLeft(op) {
      case (x, "+" ~ y) => Add(x, y)
  def operation2: Parser[ExpressionSymbol] = operand ~ rep("*" ~ operand) ^^ {
    case op ~ list => list.foldLeft(op) {
      case (x, "*" ~ y) => Mul(x, y)
  def operand: Parser[ExpressionSymbol] = (constant | variable)
  def variable: Parser[ExpressionSymbol] = "x" ^^ { _ => X }
  def constant: Parser[ExpressionSymbol] = """-?\d+""".r ^^ { s => Const(s.toInt) }

  def apply(input: String): Option[ExpressionSymbol] = parseAll(expression, input) match {
    case Success(result, _) => Some(result)
    case NoSuccess(_, _) => None

There is a method apply that aims to simplify the use of the parser. It only takes a string in parameter. It parses it and returns an Option that indicates if the parse has succeeded or failed.


  • [2012-08-25] initial publication
  • [2012-09-07] added more explanation for the first example + show longer version for the first test


Written by fsarradin

2012/08/25 at 20:42

From Optional to Monad with Guava

Guava is a kind of Swiss Army knife API proposed by Google for Java 5 and more. It contains many functionalities that help the developer in its every day life. Guava proposes also a basic functional programming API. You can represent almost statically typed functions with it, map functions on collections, compose them, etc. Since its version 10.0, Guava has added a new class called Optional. It’s an equivalent to the types Option in Scala or Maybe in Haskell. Optional aims to represent the existence of a value or not. In a sense, this class is a generalization of the Null Object pattern.

In this article, we see the Optional class in action through two use cases involving java.util.Map instances. The first use case is a very basic one. It shows how to use the Optional class. The second use case is based on a problem I’ve met in a real project. We’ll see if Optional is helpful in this case.

Optional vs. null

Suppose that you want to get a value from a Map. The value is supposed to be located under the key "a", but you aren’t sure. In the case where the key "a" doesn’t exist, the Map implementation is supposed to return null. But, you want to continue with a default value.

Here, you have two solutions. This one

Integer value = map.get("a");
if (value == null) {
    value = DEFAULT;

And, this one

Integer value = map.get("a");
process(value == null ? DEFAULT : value);

In order to use the Optional class, we need to define a new accessor for Map instances.

public static <K, V> Optional<V> getFrom(Map<K, V> map, K key) {

We notice that you can instantiate Optional by different ways: 1/ by a call to Optional.absent() if there is nothing to return (except the Optional instance), 2/ by of call to Optional.of(value) if you want to return a value. It sounds logical that an accessor to Map returns an Optional, because you aren’t sure that the given key exists in the Map instance.

Below is the same program but using Optional class.

process(getFrom(map, "a").or(DEFAULT));

With this code, we don’t see null anymore.

Going further: Optional to the limit

Now, we suppose that we develop a application based on a set of products differentiated by a unique identifier. The products are organized by product identifier, supplier, city, and country. In order to stock these products, imbricated Map are used: the first level uses the country, the second level uses the city, the third level uses the supplier, and the fourth level uses the product identifier to give access to the product. Here is the code you get with no use of Optional class (please, don’t do this at work! Seriously, don’t do this!)

public Product getProductFrom(
  Map<String, Map<String, Map<String, Map<String, Product>>>> productsByCountry,
  String country, String city, String supplier, String code) {

    Map<String, Map<String, Map<String, Product>>> productsByCity = productsByCountry.get(country);
    if (productsByCity != null) {
        Map<String, Map<String, Product>> productsBySupplier = productsByCity.get(city);
        if (productsBySupplier != null) {
            Map<String, Product> productsByCode = productsBySupplier.get(supplier);
            if (productsByCode != null) {
                return productByCode.get(code);
    return null;

This doesn’t sounds great, is it?

Now, below is the solution using Optional class.

public Optional<Product> getProductFrom(
  Map<String, Map<String, Map<String, Map<String, Product>>>> productsByCountry,
  String country, String city, String supplier, String code) {

    Optional<Map<String, Map<String, Map<String, Product>>>> productsByCity
        = getFrom(productsByCountry, country);
    if (productsByCity.isPresent()) {
        Optional<Map<String, Map<String, Product>>> productsBySupplier
            = getFrom(productsByCity.get(), city);
        if (productsBySupplier.isPresent()) {
            Optional<Map<String, Product>> productsByCode
                = getFrom(productsBySupplier.get(), supplier);
            if (productsByCode.isPresent()) {
                return getFrom(productByCode.get(), code);
    return Optional.absent();

It looks ugly too!

In a view to simplified this implementation, I propose to introduce the notion of monad.

Option(al) monad

A monad is a programming structure with two operations: unit and bind. Applied to the Optional class, unit converts a value into an Optional instance and bind applied to an Optional a function from value to Optional. Below, you’ve there definitions in Java:

public class OptionalMonad {
    public static <T> Optional<T> unit(T value) {
        return Optional.of(value);

    public static <T, U> Optional<U> bind(
            Optional<T> value,
            Function<T, Optional<U>> function) {
        if (value.isPresent()) return function.apply(value.get());
        else return Optional.absent();

Notice that bind checks the presence of a value before to apply the function.

Now, to be used by our example in the section above, we need to define a function that will be used as parameter for the bind operator. This function is based on the method getFrom, previously defined, which gives access to a value of a Map from a given key.

public static <K, V> Function<Map<K, V>, Optional<V>> getFromKey(final K key) {
    return new Function<Map<K, V>, Optional<V>>() {
        public Optional<V> apply(Map<K, V> map) {
            return getFrom(map, key);

Here is the new code

public Optional<Product> getProductFrom(
  Map<String, Map<String, Map<String, Map<String, Product>>>> productsByCountry,
  String country, String city, String supplier, String code) {

    Optional<Map<String, Map<String, Map<String, Product>>>> productsByCity
        = bind(unit(productsByCountry), Maps2.<String, Map<String, Map<String, Map<String, Product>>>>getFromKey(country));
    Optional<Map<String, Map<String, Product>>> productsBySupplier
        = bind(productsByCity, Maps2.<String, Map<String, Map<String, Product>>>getFromKey(city));
    Optional<Map<String, Product>> productsByCode
        = bind(productsBySupplier, Maps2.<String, Map<String, Product>>getFromKey(supplier));
    Optional<Product> product
        = bind(productsByCode, Maps2.<String, Product>getFromKey(code));

    return product;

Or more directly

public Optional<Product> getProductFrom(
  Map<String, Map<String, Map<String, Map<String, Product>>>> productsByCountry,
  String country, String city, String supplier, String code) {

  return bind(bind(bind(bind(unit(productsByCountry),
    Maps2.<String, Map<String, Map<String, Map<String, Product>>>>getFromKey(country)),
    Maps2.<String, Map<String, Map<String, Product>>>getFromKey(city)),
    Maps2.<String, Map<String, Product>>getFromKey(supplier)),
    Maps2.<String, Product>getFromKey(code));

OK! It’s weird too. We have to ‘fight’ with inline recursive calls of the bind method and also with Java generics. The Java type inference system isn’t sufficiently powerful to guess them. To reduce occurrences of generics in this code, we could have written a specific version of getFromKey method for each part of the given Map: getFromCountry, getFromCity, getFromSupplier, etc. This moves and distributes the complexity of the code in those methods.

  Map<String, Map<String, Map<String, Map<String, Product>>>>,
  Optional<Map<String, Map<String, Map<String, Product>>>>
getFromCountry(String country) {
  // type inference system knows what to do here
  return Maps2.getFromKey(country);

// declaration of the other methods here
// ...

public Optional<Product> getProductFrom(
  Map<String, Map<String, Map<String, Map<String, Product>>>> productsByCountry,
  String country, String city, String supplier, String code) {

  return bind(

The positive side lies into the fact that the if imbrications have disappeared and the code is linear. Notice that due to the bind operator, if one of those call to getFromKey method returns an Optional.absent(), then all the following call to bind will also return an Optional.absent().

Is there an happy end for this?

It’s difficult to do something better in Java by using the Optional class (unless you have better solution). By considering another JVM language, you’ve below a solution in Scala.

def getProductFrom(products: Map[String, Map[String, Map[String, Map[String, Product]]]],
                   country: String, city: String, supplier: String, code: String): Option[Product] = {
  for (
    cities <- products.get(country);
    suppliers <- cities.get(city);
    codes <- vendors.get(supplier);
    product <- codes.get(code)
  ) yield product

Here, the for structure provides syntactic sugar to do something close to imperative programming. But in fact, each line in this for structure does the same thing as our previous implementations using the bind operator. The get method here acts the same way as our getFrom method by returning an instance of the type Option. The type Option in Sala is equivalent to the type Optional in Guava. So when you call our Scala version of getProductFrom, if all your parameters appears in the Map, it returns a product. But if one of the parameter isn’t present, you get the default value.

In Other JVM languages proposes the safe-navigation or null-safe operator value?.method(), like in Groovy, in Fantom, in Gosu, etc. It checks if the value isn’t null before calling the method.

// get a product or null
product = products?.get(country)?.get(city)?.get(supplier)?.get(code)

For the kind of problems presented in this post, the null-safe operator does a better job than the monad approach. In fact, monads represent a programming structure with a really wider scope. Associated with types other than Optional, you can extend the application field of monads and get the necessary expressivity to explore non-determinism, concurrent programming, handle of errors, etc.


So, we’ve seen the class Optional provided by Guava in two use cases. For the simple case, we’ve seen how Optional helps to make null reference disappears. But for the complex case, we’ve seen that Optional alone provides no real advantage. Optional class can make the approach a little easier if we introduce the notion of Optional monad. Thereby, not only the null references disappear but also the recursive if structure. But, we have to fight with the Java generics. And even after this, the code is still hard to read.

It seems clear that to explore a recursive data structure made of Map with Java, you have to choose between to fight with if statements or to fight with generics. But, Java may be not the good language for this kind of problem. In other hand, you should ask yourself if using such a structure is a judicious choice?

EDIT 2012-02-23: simplified getFrom method + corrected the returned type from getFromCountry method.

Written by fsarradin

2011/12/05 at 00:16

Instantiate in Java with the Scala’s Way

Case classes in Scala facilitate the instantiation by removing the use of the keyword new. It isn’t a big invention, but it helps to have a source code more readable. Especially when you imbricate these instantiations one in another. The force of Scala’s case classes is to provide a way to practice symbolic computation — in my post named Playing with Scala’s pattern matching, read the section Advanced pattern matching: case class.

Suppose that you want to declare a type that represents a pair of elements. We don’t know the type of these elements and they may have different types. In Scala, you’ll write something like this:

case class Pair[T1, T2](first: T1, second: T2)

// usage: val myPair = Pair(1, 2)

The implementation in Java below is a (hugely) simplified equivalent to the implementation in Scala:

public class Pair<T1, T2> {
    private T1 first;
    private T2 second;

    public Pair(T1 first, T2 second) {
        this.first = first;
        this.second = second;

    public static <T1, T2> Pair<T1, T2> Pair(T1 first, T2 second) {
        return new Pair(first, second);

Here is an example where I design a binary tree structure with the help of the Java version of Pair. Notice that with this approach, I don’t have to use the diamond declaration in the instantiation (ie. Pair<Integer, Pair<String, ...>>). It’s automatically determined by the type inference algorithm.

Pair<Integer, Pair<String, Pair<Boolean, Object>>> tree
  = Pair(1,
        Pair(true, new Object())));

Now, you can say it’s really helpful!

Written by fsarradin

2011/11/09 at 00:22

Posted in Programming

Tagged with , , ,

Playing with Scala’s pattern matching

How many times have you been stuck in your frustration because you were unable to use strings as entries in switch-case statements. Such an ability would be really useful for example to analyze the arguments of your application or to parse a file, or any content of a string. Meanwhile, you have to write a series of if-else-if statements (and this is annoying). Another solution is to use a hash map, where the keys are those strings and values are the associated reified processes, for example a Runnable or a Callable in Java (but this is not really natural, long to develop, and boring too).

If a switch-case statement that accepts strings as entries would be a revolution for you, the Scala’s pattern matching says that this is not enough! Indeed, there are other cases where a series of if-else-if statements would be generously transformed into a look-alike switch-case statement. For example, it would be really nice to simplify a series of instanceof and cast included in if-else-if to execute the good process according to the type of a parameter.

In this post, we see the power of the Scala’s pattern matching in different use cases.

Notice that the pattern matching is not something new. It is something that appears already in some other languages like OCaml, Haskell. A kind of pattern matching also exits in Prolog, but it uses a far different mechanism (unification in this case).

Traditional approach

There is an approach to the Scala’s pattern matching that looks similar to the switch-case structure in C and Java: each case entry use an integer or any scalar type. Here is an example:

def toYesOrNo(choice: Int): String = choice match {
    case 1 => "yes"
    case 0 => "no"
    case _ => "error"

So if  you enter toYesOrNo(1), Scala says “yes”. And if you enter toYesOrNo(2), Scala says “error”. Notice that the symbol _ is used to formulate the default case.You also have to remark that there is no need to use the break statement.

But if you want that Scala says “yes” when you enter toYesOrNo(1), toYesOrNo(2), or toYesOrNo(3), you will write the function like this:

def toYesOrNo(choice: Int): String = choice match {
    case 1 | 2 | 3 => "yes"
    case 0 => "no"
    case _ => "error"

Now, you can use a string for each case entry. Using strings is interesting when you want to parse options of your applications:

def parseArgument(arg: String) = arg match {
    case "-h" | "--help" => displayHelp
    case "-v" | "--version" => displayVerion
    case whatever => unknownArgument(whatever)

So if you enter parseArgument(“-h”) or parseArgument(“–help”), Scala calls the displayHelp function. And if you enter parseArgument(“huh?”), Scala calls unknownArgument(“huh?”).

Notice that I have not used _ for the default case. Instead, I have put an identifier as its value is used in the associated instruction.

Typed pattern

Sometimes in Java, you have to deal with an instance of something that appears like an Object or any high level classes or interfaces. After checking for nullity, you have to use series of if-else statements and instanceof-cast to check the class or the interface that the instance extends or implements, before using it and processing the instance correctly. This the case when you have to override the equals() method in Java. Here is the way Scala sees the thing:

def f(x: Any): String = x match {
    case i:Int => "integer: " + i
    case _:Double => "a double"
    case s:String => "I want to say " + s
  • f(1) → “integer: 1”
  • f(1.0) → “a double”
  • f(“hello”) → “I want to say hello”

Is not it better than a succession of if+instanceof+cast?

This kind of pattern matching is useful to walk through a structure using the composite design pattern. For example, you can use it to explore the DOM of an XML document or the object model of a JSON message.

Functional approach to pattern matching

A side effect of such a matching is that it provides an alternative way to design functions. For example, consider the factorial function. If you choose the recursive version, usually, you would define it like this:

def fact(n: Int): Int =
    if (n == 0) 1
    else n * fact(n - 1)

But you can use Scala’s pattern matching in this case:

def fact(n: Int): Int = n match {
    case 0 => 1
    case n => n * fact(n - 1)

Notice the use of the variable n in this case. It is matched with any value that does not appears in the preceding cases. You have to understand that the n in the last case is not the same as the one used in the function signature. If you wanted to use the parameter n and not a new variable with the same name, you have to delimit the variable name with back-quote in the case definition, like this: `n`

Here is a simple way to build you factorial function in Scala:

def fact(n) = (1 to n).foldLeft(1) { _ * _ }

Pattern matching and collection: the look-alike approach

Pattern matching may be applied to the collections. Below is a function that computes the length of a list without pattern matching:

def length[A](list : List[A]) : Int = {
    if (list.isEmpty) 0
    else 1 + length(list.tail)

Here is the same function with pattern matching:

def length[A](list : List[A]) : Int = list match {
    case _ :: tail => 1 + length(tail)
    case Nil => 0

In this function, there are two cases. The last one checks if the list is empty with the Nil value. The first one checks if there is at least one element is the list. The notation _ :: tail should be understood “a list with whatever head followed by a tail”. Here the tail can be Nil (ie. empty list) or a non-empty list.

To go further, we will use this approach with tuples in order to improve our method parserArgument() above:

  def parseArgument(arg : String, value: Any) = (arg, value) match {
    case ("-l", lang) => setLanguageTo(lang)
    case ("-o" | "--optim", n : Int) if ((0 < n) && (n <= 5)) => setOptimizationLevelTo(n)
    case ("-o" | "--optim", badLevel) => badOptimizationLevel(badLevel)
    case ("-h" | "--help", null) => displayHelp()
    case bad => badArgument(bad)

Notice first the use of the operator | that allows to match alternative forms of arg inside the tuple. Notice also the use of two patterns for options -o and --optim. These patterns are distinguished by the use of a guard condition. The guard conditions help you to get fine tuned pattern matching when pattern matching alone is not enough.

Advanced pattern matching: case class

Case classes are classes with part of their behavior predefined in order to make easier their construction and their use in a pattern. Case classes enables you to manipulate parameterized symbols for example, parsed by a compiler or used by your internal Domain Specific Language (DSL).

The example below shows how to use case classes and pattern matching in order to build simple mathematical expressions, evaluate them, and compute their derivative. First, lets define the symbol to represent expressions: the variable X, constants, addition, multiplication, and the negative operator (for the fun!). Here sealed means that there is no other children of the class Expression outside of the namespace.

  sealed abstract class Expression
  case class X() extends Expression
  case class Const(value : Int) extends Expression
  case class Add(left : Expression, right : Expression) extends Expression
  case class Mult(left : Expression, right : Expression) extends Expression
  case class Neg(expr : Expression) extends Expression

Now, lets define a function to evaluate expressions with a given value for the variable by using pattern matching.

  def eval(expression : Expression, xValue : Int) : Int = expression match {
    case X() => xValue
    case Const(cst) => cst
    case Add(left, right) => eval(left, xValue) + eval(right, xValue)
    case Mult(left, right) => eval(left, xValue) * eval(right, xValue)
    case Neg(expr) => - eval(expr, xValue)

Lets try the eval() function:

val expr = Add(Const(1), Mult(Const(2), Mult(X(), X())))  // 1 + 2 * X*X
assert(eval(expr, 3) == 19)

Now, we define a function that compute the (unreduced) derivative of an expression:

  def deriv(expression : Expression) : Expression = expression match {
    case X() => Const(1)
    case Const(_) => Const(0)
    case Add(left, right) => Add(deriv(left), deriv(right))
    case Mult(left, right) => Add(Mult(deriv(left), right), Mult(left, deriv(right)))
    case Neg(expr) => Neg(deriv(expr))

Lets try the deriv() function:

val df = deriv(expr)

Here is what you get in df:

// = 0 + (0 * X*X + 2 * (1*X + X*1)) = 4 * X
assert(eval(df, 3), 12)

Other advanced pattern matching features

There are some special notations used in the Scala’s pattern matching. Scala allows to put aliases on patterns or on parts of a pattern. The alias is put before the part of the pattern, separated by @. For example, in the expression address @ Address(_, _, "Paris", "France"), we want any addresses in Paris/France, but we do not care about its content after the match is done. So after, we just use the alias address.

There is also a specific notation to match sequences with _*. It matches zero, one or more elements in a sequence to its end.

In Scala, pattern matching does not appears only after a the function match. You can put pattern matching in any closure and also in a catch block.

You can provide your own behavior for pattern matching. This is called extractor. To do so, you have to provide your own implementation of the unapply() method (and/or unapplySeq() for a sequence).


In this post, we have seen different ways to use the pattern matching with Scala. The main advantage of such a feature is to provide an easy way to build alternative structures based the matching of scalar values, strings, collections, but also types and parameterized symbols. For me, pattern matching is one of the sexiest alternatives to if statements! A good use of pattern matching can make your program really readable and helps you to build an internal DSL.

Written by fsarradin

2011/02/14 at 18:52

Posted in Programming

Tagged with , ,