Friday, October 30, 2009

Groupby - collection processing

Iterator and Iterable have most of the most useful methods when dealing with collections. Fold, Map, Filter are probably the most common. But other very useful methods include grouped/groupBy, sliding, find, forall, foreach, and many more. I want to cover Iterable's groupBy method in this topic.

This is a Scala 2.8 and later method. It is similar to partition in that it allows the collection to be divided (or partitioned). Partition takes a method with returns a boolean and partitions the collection into two depending on a result. GroupBy takes a function that returns an object and returns a Map with the key being the return value. This allows an arbitrary number of partitions to be made from the collection.

Here is the method signature:
  1. def groupBy[K](f : (A) => K) : Map[K, This]

A bit of context is require to understand the three Type parameters A, K and This. This method is defined in a super class of collections called TraversableLike (I will briefly discuss this in the next topic.) TraversableLike takes two type parameters: the type of the collection and the type contained in the collection. Therefore in this method definition, 'This' refers to the collection type (List for example) and A refers to contained type (perhaps Int). Finally K refers to the type returned by the function and are the keys of the groups formed by the method.

  1. scala> val groups = (1 to 20).toList groupBy {
  2.      | case i if(i<5) => "g1"
  3.      | case i if(i<10) => "g2"
  4.      | case i if(i<15) => "g3"
  5.      | case _ => "g4"
  6.      | }
  7. res4: scala.collection.Map[java.lang.String,List[Int]] = Map(g1 -> List(1, 2, 3, 4), g2 -> List(5, 6, 7, 8, 9), g3 -> List(10, 11, 12, 13, 14), g4 -> List(15, 16, 17, 18, 19, 20))
  8. scala> groups.keySet
  9. res6: scala.collection.Set[java.lang.String] = Set(g1, g2, g3, g4)
  10. scala> groups("g1")
  11. res7: List[Int] = List(1, 2, 3, 4)
  12. scala> val mods = (1 to 20).toList groupBy ( _ % 4 )
  13. mods: scala.collection.Map[Int,List[Int]] = Map(1 -> List(1, 5, 9, 13, 17), 2 -> List(2, 6, 10, 14, 18), 3 -> List(3, 7,
  14.  11, 15, 19), 0 -> List(4, 8, 12, 16, 20))
  15. scala> mods.keySet
  16. res9: scala.collection.Set[Int] = Set(1, 2, 3, 0)
  17. scala> mods(1)
  18. res11: List[Int] = List(1, 5, 9, 13, 17)


  1. def groupBy[K](f : (A) => K) : Map[K, This]

    I understand that K is a generic parameter, dictating that the input function to groupBy must be one that yields a K, and that the resulting Map has K as the key type.

    I don't understand what (A) means, and what This means.

  2. Excellent questions.

    I will add more context to that snippet. Take list as the example.

    List[A] extends TraversableLike[A,List]

    groupBy is part of TraversableLike.

    def groupBy[K](f : (A) => K) : Map[K, This]

    A is the type contained in the List, and This is List for list.

    If you are dealing with Array it is This would be Array.

    So List(1,2,3) groupBy {
    case 1 => "one"
    case 2 => "two"
    case _ => "other"

    would return a Map[String, List[Int]]

    I hope that helps clear it up. I will update the post to add in this information.

  3. I really like this blog, but I find it irritating that I can't find the author of the post anywhere on the page.

  4. should A be the type of the list member? in that case Int, This is the type of list itself, in that case List[Int]

  5. Curse you daniel for wanting me to be accountable for my blog ;) I've added a little about me widget for you and others.

  6. Jesse,
    Thanks so much for a clear post.
    I'm trying to follow a presentation by Odersky and I needed to understand groupBy to keep up with him - or at least if lagging far behind to at least keep him in sight.
    Thanks again for sharing.
    Lloyd Muccio

  7. Scala is amazing! Thanks! wonderful topic.