Showing posts with label view. Show all posts
Showing posts with label view. Show all posts

Thursday, May 27, 2010

zipWithIndex

A common desire is to have access to the index of an element when using collection methods like foreach, filter, foldLeft/Right, etc... Fortunately there is a simple way.

List('a','b','c','d').zipWithIndex.

But wait!

Does that not trigger an extra iteration through the collection?. Indeed it does and that is where Views help.

List('a','b','c','d').view.zipWithIndex

When using a view the collection is only traversed when required so there is no performance loss.

Here are some examples of zipWithIndex:
  1. scala> val list = List('a','b','c','d')
  2. list: List[Char] = List(a, b, c, d)
  3. /*
  4. I like to use functions constructed with case statements
  5. in order to clearly label the index.  The alternative is 
  6. to use x._2 for the index and x._1 for the value
  7. */
  8. scala> list.view.zipWithIndex foreach {case (value,index) => println(value,index)}
  9. (a,0)
  10. (b,1)
  11. (c,2)
  12. (d,3)
  13. // alternative syntax without case statement
  14. scala> list.view.zipWithIndex foreach {e => println(e._1,e._2)}
  15. (a,0)
  16. (b,1)
  17. (c,2)
  18. (d,3)
  19. /*
  20. Fold left and right functions have 2 parameters (accumulator, nextValue) 
  21. using a case statement allows you to expand that but watch the brackets!
  22. */
  23. scala> (list.view.zipWithIndex foldLeft 0) {case (acc,(value,index)) => acc + value.toInt + index} 
  24. res14: Int = 400
  25. // alternative syntax without case statement
  26. scala> (list.view.zipWithIndex foldLeft 0) {(acc,e) => acc + e._1.toInt + e._2} 
  27. res23: Int = 400
  28. /*
  29. alternative foldLeft operator.  The thing I like about this
  30. syntax is that it has the initial accumulator value on the left 
  31. in the same position as the accumulator parameter in the function.
  32. The other thing I like about it is that visually you can see that it starts with
  33. "" and the folds left
  34. */
  35. scala> ("" /: list.view.zipWithIndex) {                          
  36.      | case (acc, (value, index)) if index % 2 == 0 => acc + value
  37.      | case (acc, _) => acc                                       
  38.      | }
  39. res15: java.lang.String = ac
  40. /*
  41. This example filters based on the index then uses map to remove the index
  42. force simply forces the view to be processed.  (I love these collections!)
  43. */
  44. scala> list.view.zipWithIndex.filter { _._2 % 2 == 0 }.map { _._1}.force
  45. res29: Seq[Char] = List(a, c)

Tuesday, November 3, 2009

Non-strict (lazy) Collections

This topic discusses non-strict collections. There is a fair bit of confusion with regards to non-strict collections: what to call them and what are they. First lets clear up some definitions.

Pre Scala 2.8 collections have a projection method. This returns a non-strict collections. So often non-strict collections are called projections in Pre Scala 2.8
Scala 2.8+ the method was changed to view, so in Scala 2.8+ view sometimes refers to non-strict collections (and sometime to a implicit conversion of a class)
Another name for non-strict collections I have seen is "lazy collections."

All those labels are for the same thing "non-strict collections" which is the functional programming term and which I will use for the rest of this topic.

As an excellent addition to this topic please take a look at Strict Ranges? by Daniel Sobral.

One way to think of non-strict collections are pull collections. A programmer can essentially form a sequence of functions and the evaluation is only performed on request.

Note: I am intentionally adding side effects to the processes in order to demonstrate where processing takes place. In practice the collectionsshould be immutable (ideally) and the processing the collections really should be side-effect free. Otherwise almost guaranteed you will find yourself with a bug that is almost impossible to find.

Example of processing a strict collection:
  1. scala> var x=0
  2. x: Int = 0
  3. scala> def inc = {
  4.      | x += 1
  5.      | x
  6.      | }
  7. inc: Int
  8. scala> var list =  List(inc _, inc _, inc _)
  9. list: List[() => Int] = List(<function0><function0><function0>)
  10. scala> list.map (_()).head
  11. res0: Int = 1
  12. scala> list.map (_()).head
  13. res1: Int = 4
  14. scala> list.map (_()).head
  15. res2: Int = 7

Notice how each time the expression is called x is incremented 3 times. Once for each element in the list. This demonstrates that map is being called for every element of the list even though only head is being calculated. This is strict behaviour.

Example of processing a non-strict collection with Scala 2.7.5
For Scala 2.8 change project => view.
  1. scala> var x=0
  2. x: Int = 0
  3. scala> def inc = {
  4.      | x += 1
  5.      | x
  6.      | }
  7. inc: Int
  8. scala> var list =  List(inc _, inc _, inc _)
  9. list: List[() => Int] = List(<function0><function0><function0>)
  10. scala> list.projection.map (_()).head
  11. res0: Int = 1
  12. scala> list.projection.map (_()).head
  13. res1: Int = 2
  14. scala> list.projection.map (_()).head
  15. res2: Int = 3
  16. scala> list.projection.map (_()).head
  17. res3: Int = 4

Here you can see that only one element in the list is being calculated for the head request. That is the idea behind non-strict collections and can be useful when dealing with large collections and very expensive operations. This also demonstrates why side-effects are so crazy dangerous!.

More examples (Scala 2.8):
  1. scala> var x=0
  2. x: Int = 0
  3. scala> def inc = { x +=1; x }
  4. inc: Int
  5. // strict processing of a range and obtain the 6th element
  6. // this will run inc for every element in the range
  7. scala> (1 to 10).map( _ + inc).apply(5)
  8. res2: Int = 12
  9. scala> x
  10. res3: Int = 10
  11. // reset for comparison
  12. scala> x = 0
  13. x: Int = 0
  14. // now non-strict processing but the same process
  15. // you get a different answer because only one
  16. // element is calculated
  17. scala> (1 to 10).view.map( _ + inc).apply(5)
  18. res6: Int = 7
  19. // verify that x was incremented only once
  20. scala> x
  21. res7: Int = 1
  22. // reset for comparison
  23. scala> x = 0
  24. x: Int = 0
  25. // force forces strict processing
  26. // now we have the same answer as if we did not use view
  27. scala> (1 to 10).view.map( _ + inc).force.apply(5)
  28. res9: Int = 12
  29. scala> x
  30. res10: Int = 10
  31. // reset for comparison
  32. scala> x = 0
  33. x: Int = 0
  34. // first 5 elements are computed only
  35. scala> (1 to 10).view.map( _ + inc).take(5).mkString(",")
  36. res9: String = 2,4,6,8,10
  37. scala> x
  38. res10: Int = 5
  39. // reset for comparison
  40. scala> x = 0
  41. x: Int = 0
  42. // only first two elements are computed
  43. scala> (1 to 10).view.map( _ + inc).takeWhile( _ < 5).mkString(",")
  44. res11: String = 5,7
  45. scala> x
  46. res12: Int = 5
  47. // reset for comparison
  48. scala> x = 0
  49. x: Int = 0
  50. // inc is called 2 for each element but only the last 5 elements are computed so
  51. // x only == 10 not 20
  52. scala> (1 to 10).view.map( _ + inc).map( i => inc ).drop(5).mkString(",")
  53. res16: String = 2,4,6,8,10
  54. scala> x
  55. res17: Int = 10
  56. scala> x = 0                                             
  57. x: Int = 0
  58. // define this for-comprehension in a method so that
  59. // the repl doesn't call toString on the result value and
  60. // as a result force the full list to be processed
  61. scala> def add = for( i <- (1 to 10).view ) yield i + inc
  62. add: scala.collection.IndexedSeqView[Int,IndexedSeq[_]]
  63. scala> add.head                                          
  64. res5: Int = 2
  65. // for-comprehensions will also be non-strict if the generator is non-strict
  66. scala> x
  67. res6: Int = 1