Daily scala: non-strict

Monday, January 18, 2010

Introducing Streams

Streams are a special type of Iterable/Traversable whose elements are not evaluated until they are requested. Streams are normally constructed as functions.

A Scala List basically consists of the head element and the rest of the list. A Scala Stream is the head of the stream and a function that can construct the rest of the Stream. It is a bit more than this because each element is evaluated only once and stored in memory for future evaluations. That should be more clear after some examples.

As usual I created these examples with the Scala 2.8 REPL but I think most if not all should work in 2.7.

scala> import Stream.cons          
import Stream.cons
/*
Because streams are very functional in nature it is recommended that methods from the Stream object are used for creation
This is a real boring example of creating a Stream.  Anything this simple should be a list.
The important part is that cons take the value at the point and a function to return the rest of the
stream NOT another stream.  
*/
scala> val stream1 = cons(0,cons(1,Stream.empty))
stream1: Stream.Cons[Int] = Stream(0, ?)
scala> stream1 foreach {print _}                 
01
/*
This illustrates the similarity in design between Stream and list, again the difference is the entire list is created in a stream the second argument of cons is not evaluated until it is requested
*/
scala> new ::(0, new ::(1,List.empty))
res35: scala.collection.immutable.::[Int] = List(0, 1)
/*
To drive home the point of the similarities.  Here is an alternative declaration of a
stream most similar to declaring a list
*/
scala> val stream2 = 0 #:: 1 #:: Stream.empty    
stream2: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> stream2 foreach {print _}
01
scala> 0 :: 1 :: Nil
res36: List[Int] = List(0, 1)
/*
A little more interesting now.  The accessing the second element will run the function.  Notice it is not evaluated until request
*/
scala> val stream3 = cons (0, {    
     | println("getting second element")
     | cons(1,Stream.empty)
     | })
stream3: Stream.Cons[Int] = Stream(0, ?)
scala> stream3(0)
res56: Int = 0
// function is evaluated
scala> stream3(1)
getting second element
res57: Int = 1
/* 
Function is only evaluated once.  
Important! This means that all elements in a Stream are loaded into a memory so
it can cause a OutOfMemoryError if the stream is large
*/
scala> stream3(1)
res58: Int = 1
scala> stream3(1)
res59: Int = 1
/*
This creates an infinate stream then forces resolution of all elements
*/
scala> Stream.from(100).force            
java.lang.OutOfMemoryError: Java heap space
// Alternative demonstration of laziness
scala> val stream4 = 0 #:: {println("hi"); 1} #:: Stream.empty
stream4: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> stream4(1)
hi
res2: Int = 1

A very common way to construct a Stream is to define a recursive method. Each recursive call constructs a new element in the stream. The method may or may not have a guard condition that terminates the stream.

// construct a stream of random elements
scala> def make : Stream[Int] = Stream.cons(util.Random.nextInt(10), make)
make: Stream[Int]
scala> val infinate = make                                                
infinate: Stream[Int] = Stream(3, ?)
scala> infinate(5)                                  
res10: Int = 6
scala> infinate(0)
res11: Int = 3
// Once evaluated each element does not change
scala> infinate(5)
res13: Int = 6
// this function makes a stream that does terminate
scala> def make(i:Int) : Stream[String] = {                  
     | if(i==0) Stream.empty                                 
     | else Stream.cons(i + 5 toString, make(i-1))           
     | }
make: (i: Int)Stream[String]
scala> val finite = make(5)                       
finite: Stream[String] = Stream(10, ?)
scala> finite foreach print _                     
109876
// One last demonstration of making a stream object
scala> Stream.cons("10", make(2))
res18: Stream.Cons[String] = Stream(10, ?)
/*
This method is dangerous as it forces the entire stream to be evaluated
*/
scala> res18.size
res19: Int = 3

This is only an introduction. I hope to add a few more topics that focus on Streams because they can be very powerful but are also more challenging to recognize where they should be used instead of a standard collection.

Tuesday, November 3, 2009

Non-strict (lazy) Collections

This topic discusses non-strict collections. There is a fair bit of confusion with regards to non-strict collections: what to call them and what are they. First lets clear up some definitions.

Pre Scala 2.8 collections have a projection method. This returns a non-strict collections. So often non-strict collections are called projections in Pre Scala 2.8
Scala 2.8+ the method was changed to view, so in Scala 2.8+ view sometimes refers to non-strict collections (and sometime to a implicit conversion of a class)
Another name for non-strict collections I have seen is "lazy collections."

All those labels are for the same thing "non-strict collections" which is the functional programming term and which I will use for the rest of this topic.

As an excellent addition to this topic please take a look at Strict Ranges? by Daniel Sobral.

One way to think of non-strict collections are pull collections. A programmer can essentially form a sequence of functions and the evaluation is only performed on request.

Note: I am intentionally adding side effects to the processes in order to demonstrate where processing takes place. In practice the collectionsshould be immutable (ideally) and the processing the collections really should be side-effect free. Otherwise almost guaranteed you will find yourself with a bug that is almost impossible to find.

Example of processing a strict collection:

scala> var x=0
x: Int = 0
scala> def inc = {
     | x += 1
     | x
     | }
inc: Int
scala> var list =  List(inc _, inc _, inc _)
list: List[() => Int] = List(<function0>, <function0>, <function0>)
scala> list.map (_()).head
res0: Int = 1
scala> list.map (_()).head
res1: Int = 4
scala> list.map (_()).head
res2: Int = 7

Notice how each time the expression is called x is incremented 3 times. Once for each element in the list. This demonstrates that map is being called for every element of the list even though only head is being calculated. This is strict behaviour.

Example of processing a non-strict collection with Scala 2.7.5
For Scala 2.8 change project => view.

scala> var x=0
x: Int = 0
scala> def inc = {
     | x += 1
     | x
     | }
inc: Int
scala> var list =  List(inc _, inc _, inc _)
list: List[() => Int] = List(<function0>, <function0>, <function0>)
scala> list.projection.map (_()).head
res0: Int = 1
scala> list.projection.map (_()).head
res1: Int = 2
scala> list.projection.map (_()).head
res2: Int = 3
scala> list.projection.map (_()).head
res3: Int = 4

Here you can see that only one element in the list is being calculated for the head request. That is the idea behind non-strict collections and can be useful when dealing with large collections and very expensive operations. This also demonstrates why side-effects are so crazy dangerous!.

More examples (Scala 2.8):

scala> var x=0
x: Int = 0
scala> def inc = { x +=1; x }
inc: Int
// strict processing of a range and obtain the 6th element
// this will run inc for every element in the range
scala> (1 to 10).map( _ + inc).apply(5)
res2: Int = 12
scala> x
res3: Int = 10
// reset for comparison
scala> x = 0
x: Int = 0
// now non-strict processing but the same process
// you get a different answer because only one
// element is calculated
scala> (1 to 10).view.map( _ + inc).apply(5)
res6: Int = 7
// verify that x was incremented only once
scala> x
res7: Int = 1
// reset for comparison
scala> x = 0
x: Int = 0
// force forces strict processing
// now we have the same answer as if we did not use view
scala> (1 to 10).view.map( _ + inc).force.apply(5)
res9: Int = 12
scala> x
res10: Int = 10
// reset for comparison
scala> x = 0
x: Int = 0
// first 5 elements are computed only
scala> (1 to 10).view.map( _ + inc).take(5).mkString(",")
res9: String = 2,4,6,8,10
scala> x
res10: Int = 5
// reset for comparison
scala> x = 0
x: Int = 0
// only first two elements are computed
scala> (1 to 10).view.map( _ + inc).takeWhile( _ < 5).mkString(",")
res11: String = 5,7
scala> x
res12: Int = 5
// reset for comparison
scala> x = 0
x: Int = 0
// inc is called 2 for each element but only the last 5 elements are computed so
// x only == 10 not 20
scala> (1 to 10).view.map( _ + inc).map( i => inc ).drop(5).mkString(",")
res16: String = 2,4,6,8,10
scala> x
res17: Int = 10
scala> x = 0                                             
x: Int = 0
// define this for-comprehension in a method so that
// the repl doesn't call toString on the result value and
// as a result force the full list to be processed
scala> def add = for( i <- (1 to 10).view ) yield i + inc
add: scala.collection.IndexedSeqView[Int,IndexedSeq[_]]
scala> add.head                                          
res5: Int = 2
// for-comprehensions will also be non-strict if the generator is non-strict
scala> x
res6: Int = 1