Showing posts with label collections. Show all posts
Showing posts with label collections. Show all posts

Tuesday, April 13, 2010

Creating Custom Traversable implementations

One of the most talked about features of Scala 2.8 is the improved Collections libraries. Creating your own implementation is trivial, however if you want your new collection to behave the same way as all the included libraries there are a few tips you need to be aware of.

Note: All of these examples can either be ran in the REPL or put in a file and ran

Starting with the simple implementation:
  1. import scala.collection._
  2. import scala.collection.generic._
  3. class MyColl[A](seq : A*) extends Traversable[A] {
  4.     // only abstract method in traversable is foreach... easy :) 
  5.   def foreach[U](f: A => U) = util.Random.shuffle(seq.toSeq).foreach(f)
  6. }

This is a silly collection I admit but it is custom :).

This example works but if you test the result of a map operation (or any other operation that returns a new instance of the collection) you will notice it is not an instance of MyColl. This is expected because unless otherwise defined Traversable will return a new instance of Traversable.

To demonstrate run the following tests:
  1. val c = new MyColl(1, 2, 3)
  2. println (c mkString ",")
  3. println(c mkString ",")
  4. println(c drop 1 mkString ",")
  5. // this two next assertions fail (see following explanation)
  6. assert(c.drop(1).isInstanceOf[MyColl[_]])
  7. assert((c map {_ + 1}).isInstanceOf[MyColl[_]])

Both assertions will fail. The reason for these failures is because the collection is immutable which dictates by necessity that a new object must be returned from filter/map/etc... Since the Traversable trait returns instances of Traversable these two assertions fail. The easiest way to make these methods return an instance of MyColl is to make the following changes/additions.
  1. import scala.collection._
  2. import scala.collection.generic._
  3. /*
  4. Adding GenericTraversableTemplate will delegate the creation of new
  5. collections to the companion object.  Adding the trait and
  6. companion object causes all the new collections to be instances of MyColl
  7. */
  8. class MyColl[A](seq : A*) extends Traversable[A] 
  9.                              with GenericTraversableTemplate[A, MyColl] {
  10.   override def companion = MyColl
  11.   def foreach[U](f: A => U) = util.Random.shuffle(seq.toSeq).foreach(f)
  12. }
  13. // The TraversableFactory trait is required by GenericTraversableTemplate
  14. object MyColl extends TraversableFactory[MyColl] {
  15. /* 
  16. If you look at the signatures of many methods in TraversableLike they have an
  17. implicit parameter canBuildFrom.  This allows one to define how the returned collections
  18. are built.  For example one could make a list's map method return a Set
  19. In this case we define the default canBuildFrom for MyColl
  20. */
  21.   implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, MyColl[A]] = new GenericCanBuildFrom[A]
  22. /*  
  23. The method that builds the new collection.  This is a simple implementation
  24. but it works.  There are other implementations to assist with implementation if
  25. needed
  26. */
  27.   def newBuilder[A] = new scala.collection.mutable.LazyBuilder[A,MyColl[A]] {
  28.     def result = {
  29.       val data = parts.foldLeft(List[A]()){(l,n) => l ++ n}
  30.       new MyColl(data:_*)
  31.     }
  32.   }
  33. }

Now instances of MyColl will be created by the various filter/map/etc... methods and that is fine as long as the new object is not required at compile-time. But suppose we added a method to the class and want that accessible after applying methods like map and filter.

Adding val o : MyColl[Long] = c map {_.toLong} to the assertions will cause a compilation error since statically the class returned is Traversable[Long]. The fix is easy.

All that needs to be done is to add with TraversableLike[A, MyColl[A]] to MyColl and we are golden. There may be other methods as well but this works and is simple.

Note that the order in which the traits are mixed in is important. TraversableLike[A, MyColl[A]] must be mixed in after Traversable[A]. The reason is that we want methods like map and drop to return instances of MyColl (statically as well as dynamically). If the order was reversed then those methods would return Traversable event though statically the actual instances would still be MyColl.
  1. import scala.collection._
  2. import scala.collection.generic._
  3. class MyColl[A](seq : A*) extends Traversable[A]
  4.                              with GenericTraversableTemplate[A, MyColl] 
  5.                              with TraversableLike[A, MyColl[A]] {
  6.   override def companion = MyColl
  7.   def foreach[U](f: A => U) = util.Random.shuffle(seq.toSeq).foreach(f)
  8. }
  9. object MyColl extends TraversableFactory[MyColl] {  
  10.   implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, MyColl[A]] = new GenericCanBuildFrom[A]
  11.   def newBuilder[A] = new scala.collection.mutable.LazyBuilder[A,MyColl[A]] {
  12.     def result = {
  13.       val data = parts.foldLeft(List[A]()){(l,n) => l ++ n}
  14.       new MyColl(data:_*)
  15.     }
  16.   }
  17. }

Now add in a new method to demonstrate that the new collection works as desired and we are done.

The following is the complete implementation with the tests. You can put it in a file and run scala <filename> or paste it into a REPL
  1. import scala.collection._
  2. import scala.collection.generic._
  3. import scala.collection.mutable.{ Builder, ListBuffer }
  4. class MyColl[A](seq : A*) extends Traversable[A]
  5.                              with GenericTraversableTemplate[A, MyColl] 
  6.                              with TraversableLike[A, MyColl[A]] {
  7.   override def companion = MyColl
  8.   def foreach[U](f: A => U) = util.Random.shuffle(seq.toSeq).foreach(f)
  9.   def sayhi = println("hi!")
  10. }
  11. object MyColl extends TraversableFactory[MyColl] {  
  12.   implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, MyColl[A]] = new GenericCanBuildFrom[A]
  13.   def newBuilder[A] = new ListBuffer[A] mapResult (x => new MyColl(x:_*))
  14. }
  15. val c = new MyColl(1, 2, 3)
  16. println (c mkString ",")
  17. println(c mkString ",")
  18. assert(c.drop(1).isInstanceOf[MyColl[_]])
  19. assert((c map {_ + 1}).isInstanceOf[MyColl[_]])
  20. val o : MyColl[Int] = c filter {_ < 2}
  21. println(o mkString "," )
  22. o.sayhi

Wednesday, March 31, 2010

Variant Positions 2

This is a continuation of post: Variant Positions 1

...

My first attempt at the Verified was to make it a mutable object (my Java tourettes kicking in). But it cannot be covariant and mutable. Look at the code to see why:
  1. class Verified[+A <: V,V](assertion : (V) => Booleanprivate var value : A){
  2.     assert(assertion(value))
  3.     
  4.     def a = value
  5. // update is illegal.  See the example below
  6.     def update[ B >: A <: V](a : B) = value = a
  7. }
  8. def notNull(obj : AnyRef) = obj != null
  9. val v = new Verified(notNull, "hi")
  10. /*
  11. Up to this point everything looks ok but the next line
  12. will assign an object to value which is a reference to a String
  13. */
  14. update (new Object())

For Verified to be mutable A must be invariant. If you look at the Mutable collections in Scala they are all invariant.

Here is an interesting example of both invariant and covariant type parameters in a class hierarchy:
  1. scala> class X[+A](val x :A)
  2. defined class X
  3. scala> class Y[A](var a: A) extends X[A](a)
  4. defined class Y
  5. scala> val x: X[Any] = new Y[String]("hi")
  6. x: X[Any] = Y@1732a4df
  7. scala> x.asInstanceOf[Y[String]].a="ho"

This example is perfectly legal because no matter how X[Any] is used no illegal assignment in Y can occur. The interesting thing is that the object can be used in covariant usecases when only X is required. This is now the collections in Scala can work.

Here is a little example of collections invariance and covariance in action. In List the parameter is covariant but in Buffer it is invariant
  1. scala> def printList(l : List[Any]) = print(l mkString " :: ")
  2. printList: (l: List[Any])Unit
  3. scala> val buffer = Buffer(1,2,3)
  4. buffer: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1, 2, 3)
  5. scala> printList(buffer)
  6. 1 :: 2 :: 3
  7. /*
  8. ++ is part of Iterable.  Since Iterable is covariant ++ 
  9. returns a new buffer it does not modify the existing buffer
  10. All mutators are only defined on invariant traits
  11. */
  12. scala> buffer ++ List(4)
  13. res16: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1, 2, 3, 4)
  14. scala> res16 eq buffer
  15. res17: Boolean = false
  16. /*
  17. buffer defines += to add an element to the buffer
  18. so res27 is the same buffer with an extra element
  19. */
  20. scala> buffer += 10
  21. res27: buffer.type = ArrayBuffer(1, 2, 3, 10)
  22. scala> res27 eq buffer
  23. res28: Boolean = true

Thursday, March 4, 2010

Zip with a constant value

A simple tip for zipping a List (or other collection) with a single value.

  1. scala> Stream.continually("h") zip List(1,2,3,4)
  2. res2: scala.collection.immutable.Stream[(java.lang.String, Int)] = Stream((h,1), ?)
  3. scala> res2 mkString ","
  4. res3: String = (h,1),(h,2),(h,3),(h,4)
  5. scala> List(1,2,3,4) zip Stream.continually("h")
  6. res4: List[(Int, java.lang.String)] = List((1,h), (2,h), (3,h), (4,h))

Monday, January 18, 2010

Introducing Streams

Streams are a special type of Iterable/Traversable whose elements are not evaluated until they are requested. Streams are normally constructed as functions.

A Scala List basically consists of the head element and the rest of the list. A Scala Stream is the head of the stream and a function that can construct the rest of the Stream. It is a bit more than this because each element is evaluated only once and stored in memory for future evaluations. That should be more clear after some examples.

As usual I created these examples with the Scala 2.8 REPL but I think most if not all should work in 2.7.

  1. scala> import Stream.cons          
  2. import Stream.cons
  3. /*
  4. Because streams are very functional in nature it is recommended that methods from the Stream object are used for creation
  5. This is a real boring example of creating a Stream.  Anything this simple should be a list.
  6. The important part is that cons take the value at the point and a function to return the rest of the
  7. stream NOT another stream.  
  8. */
  9. scala> val stream1 = cons(0,cons(1,Stream.empty))
  10. stream1: Stream.Cons[Int] = Stream(0, ?)
  11. scala> stream1 foreach {print _}                 
  12. 01
  13. /*
  14. This illustrates the similarity in design between Stream and list, again the difference is the entire list is created in a stream the second argument of cons is not evaluated until it is requested
  15. */
  16. scala> new ::(0, new ::(1,List.empty))
  17. res35: scala.collection.immutable.::[Int] = List(0, 1)
  18. /*
  19. To drive home the point of the similarities.  Here is an alternative declaration of a
  20. stream most similar to declaring a list
  21. */
  22. scala> val stream2 = 0 #:: 1 #:: Stream.empty    
  23. stream2: scala.collection.immutable.Stream[Int] = Stream(0, ?)
  24. scala> stream2 foreach {print _}
  25. 01
  26. scala> 0 :: 1 :: Nil
  27. res36: List[Int] = List(0, 1)
  28. /*
  29. A little more interesting now.  The accessing the second element will run the function.  Notice it is not evaluated until request
  30. */
  31. scala> val stream3 = cons (0, {    
  32.      | println("getting second element")
  33.      | cons(1,Stream.empty)
  34.      | })
  35. stream3: Stream.Cons[Int] = Stream(0, ?)
  36. scala> stream3(0)
  37. res56: Int = 0
  38. // function is evaluated
  39. scala> stream3(1)
  40. getting second element
  41. res57: Int = 1
  42. /* 
  43. Function is only evaluated once.  
  44. Important! This means that all elements in a Stream are loaded into a memory so
  45. it can cause a OutOfMemoryError if the stream is large
  46. */
  47. scala> stream3(1)
  48. res58: Int = 1
  49. scala> stream3(1)
  50. res59: Int = 1
  51. /*
  52. This creates an infinate stream then forces resolution of all elements
  53. */
  54. scala> Stream.from(100).force            
  55. java.lang.OutOfMemoryError: Java heap space
  56. // Alternative demonstration of laziness
  57. scala> val stream4 = 0 #:: {println("hi"); 1} #:: Stream.empty
  58. stream4: scala.collection.immutable.Stream[Int] = Stream(0, ?)
  59. scala> stream4(1)
  60. hi
  61. res2: Int = 1

A very common way to construct a Stream is to define a recursive method. Each recursive call constructs a new element in the stream. The method may or may not have a guard condition that terminates the stream.
  1. // construct a stream of random elements
  2. scala> def make : Stream[Int] = Stream.cons(util.Random.nextInt(10), make)
  3. make: Stream[Int]
  4. scala> val infinate = make                                                
  5. infinate: Stream[Int] = Stream(3, ?)
  6. scala> infinate(5)                                  
  7. res10: Int = 6
  8. scala> infinate(0)
  9. res11: Int = 3
  10. // Once evaluated each element does not change
  11. scala> infinate(5)
  12. res13: Int = 6
  13. // this function makes a stream that does terminate
  14. scala> def make(i:Int) : Stream[String] = {                  
  15.      | if(i==0) Stream.empty                                 
  16.      | else Stream.cons(i + 5 toString, make(i-1))           
  17.      | }
  18. make: (i: Int)Stream[String]
  19. scala> val finite = make(5)                       
  20. finite: Stream[String] = Stream(10, ?)
  21. scala> finite foreach print _                     
  22. 109876
  23. // One last demonstration of making a stream object
  24. scala> Stream.cons("10", make(2))
  25. res18: Stream.Cons[String] = Stream(10, ?)
  26. /*
  27. This method is dangerous as it forces the entire stream to be evaluated
  28. */
  29. scala> res18.size
  30. res19: Int = 3


This is only an introduction. I hope to add a few more topics that focus on Streams because they can be very powerful but are also more challenging to recognize where they should be used instead of a standard collection.

Friday, December 18, 2009

Tuples are (not?) Collections

Tuples are quite handy but a potential annoyance is that at a glance they seem list-like but appearances can be deceiving. The normal collection methods are not supported by Tuples. For example suppose you like the simple syntax: (1,2,3) better than List(1,2,3). With the Tuple you cannot do the maps, filters, etc... (with good reason) but there is a use-case for being able to convert a Tuple to a Traversable.

Word of warning. Unlike collections Tuples are very often not homogeneous. IE you do not have Tuple2[Int] you have Tuple2[A,B]. So the best you can do is to map a tuple to an Iterator[Any].

Note: this was done with Scala 2.8 so results may be slightly different with 2.7. But I believe the syntax is valid.
  1. // the productIterator method gives access to an iterator over the elements
  2. scala> (1,2,3).productIterator.map {_.toString} mkString (",")
  3. res1: String = 1,2,3
  4. scala> (1,2,3).productIterator foreach {println _}            
  5. 1
  6. 2
  7. 3
  8. // if you know the tuple is heterogeneous then you can use partial functions
  9. // for casting the elements to a particular type.  
  10. scala> (1,2,3).productIterator map {case i:Int => i + 2} foreach {println _}
  11. 3
  12. 4
  13. 5
  14. // To get a full Traversable out of the deal you use one of the many
  15. // to* methods to convert to Seq, List, Array, etc...
  16. scala> (1,2,3).productIterator.toList map {case i:Int => i + 2}      
  17. res15: List[Int] = List(3, 4, 5)
  18. // and if you want you can use an implicit to clean up the syntax a bit
  19. // Problem with this is you need an implicit for each Tuple length
  20. scala> implicit def tupleToTraversable[T](t:(T,T,T)) = t.productIterator.toList map { case e:T => e}
  21. warning: there were unchecked warnings; re-run with -unchecked for details
  22. tupleToTraversable: [T](t: (T, T, T))List[T]
  23. scala> (1,2,3) foreach {println _}
  24. 1
  25. 2
  26. 3
  27. /* 
  28. EDIT:  Dan pointed out that the methods I am using are inherited from the
  29. Product super class of Tuple.  So you can do something similar as follows.
  30. Note:  I am using the same name as the previous method so that they don't interfer 
  31. with one another
  32. */
  33. scala> implicit def tupleToTraversable[T](t:Product) = t.productIterator.toList map { case e:T => e} 
  34. warning: there were unchecked warnings; re-run with -unchecked for details
  35. tupleToTraversable: [T](t: Product)List[T]
  36. scala> (1,2,3) foreach {println _}
  37. 1
  38. 2
  39. 3
  40. // this is interesting... it does cast to int unless required
  41. scala> (1,2,'t') foreach {println _}
  42. 1
  43. 2
  44. t
  45. // lets verify we are getting correct conversion
  46. scala> def inc(l:List[Int]) = l map {_ + 1} 
  47. inc: (l: List[Int])List[Int]
  48. scala> inc ( (1,2,3))                              
  49. res4: List[Int] = List(2, 3, 4)
  50. // yep this I expected
  51. scala> inc ( (1,2,'t'))
  52. java.lang.ClassCastException: java.lang.Character cannot be cast to java.lang.Integer
  53. at scala.runtime.BoxesRunTime.unboxToInt(Unknown Source)
  54. at $anonfun$inc$1.apply(< console>:7)
  55. at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:238)
  56. at .< clinit>(< console>)
  57. scala> def inc(l:List[Int]) = l foreach {println _}
  58. inc: (l: List[Int])Unit
  59. scala> def p(l:List[Int]) = l foreach {println _}  
  60. p: (l: List[Int])Unit
  61. scala> p ( (1,2,'t'))                              
  62. 1
  63. 2
  64. t

Monday, December 7, 2009

Scala 2.8 Traversable and IndexedSequence

This topic is really just a collection of examples using the Scala 2.8 API. The examples are mostly based on Strings. But because Strings are in fact an IndexedSeq all of these examples also work with Lists arrays and the first several work with all collections.

These examples are all based on first Traversable. The basis of all collections and therefore will work with all collections. And the next set of examples work with IndexedSeq and most will work with Seq as well. Sets, Maps and others still need to be examined.

This is not an exhaustive set of examples and not all are even that interesting (sorry :-) ). But I let out the most trivial unless theay are being compared to others.

Traversable:
  1. scala> val license = """Subject of Agreement. ?Product?, as referred to in this Agreement, shall be the binary software package ?Sun VirtualBox,? which Product allows for creating multiple virtual computers, each with different operating systems (?Guest Computers?), on a physical computer with a specific operating system (?Host Computer?), to allow for installing and executing these Guest Computers simultaneously. The Product consists of executable files in machine code for the Solaris, Windows, Linux, and Mac?OS?X operating systems as well as other data files as required by the executable files at run-time and documentation in electronic form. The Product includes all documentation and updates provided to You by Sun under this Agreement and the terms of this Agreement will apply to all such documentation and updates unless a different license is provided with an update or documentation."""
  2. res0: RandomAccessSeq[Char] = 
  3. license: java.lang.String = ...
  4. scala> license count (_ == 'u')
  5. res1: Int = 37
  6. scala> license split ' ' count {_ == "and"
  7. res3: Int = 6
  8. scala> license ++ " this is silly!"
  9. res0: RandomAccessSeq[Char] = ...
  10. scala> license dropWhile { _ != 'y'} drop 1 takeWhile { _ != 'y'}
  11. res6: Seq[Char] = RandomAccessSeqTW( , s, o, ...
  12. scala> license forall {_ != 'z'}
  13. res8: Boolean = true
  14. scala> (1 until 5).init
  15. res0: scala.collection.immutable.Range = Range(1, 2, 3)
  16. scala> (1 until 5).head
  17. res1: Int = 1
  18. scala> (1 until 5).tail
  19. res2: scala.collection.immutable.IndexedSeq[Int] = Range(2, 3, 4)
  20. scala> (1 until 5).last
  21. res3: Int = 4
  22. scala> (1 until 5).headOption
  23. res4: Option[Int] = Some(1)
  24. scala> (1 until 5).lastOption
  25. res6: Option[Int] = Some(4)
  26. scala> license max   
  27. res8: Char = y
  28. scala> license min
  29. res9: Char =  
  30. scala> List() nonEmpty 
  31. res10: Boolean = false
  32. scala> List() isEmpty 
  33. res11: Boolean = true
  34. scala> license partialMap {case c if ('a' to 'z' contains c) => c}
  35. res13: scala.collection.immutable.IndexedSeq[Any] = IndexedSeq(u, b, j, e, c ...)
  36. scala> 1 to 10000 by 2 product 
  37. res19: Int = 481029393
  38. scala> 1 to 10000 by 2 sum                   
  39. res23: Int = 25000000
  40. scala> license partition {"aeiou" contains _}
  41. res20: (StringString) = (ueoeeeouaeeeoiieeeaeeia ....
  42. scala> license span {_ != 'a'}
  43. res22: (StringString) = (Subject of Agreement. ?Product?, ,as referred to in this Agreement, shall be the binary software package ?Sun VirtualBox,? which Product allows for creating multiple virtual computers, each with differ ...
  44. scala> val consonants = license withFilter {c => !("aeiuo" contains c)}
  45. consonants: scala.collection.immutable.StringOps#WithFilter = scala.collection.TraversableLike$WithFilter@78e8a591
  46. scala> consonants foreach print _                                      
  47. Sbjct f Agrmnt. ?Prdct?, s rfrrd t n ths Agrmnt, shll b th bnry sftwr pckg ?Sn VrtlBx,? whch Prdct llws fr crtng mltpl vrtl cmptrs, ch wth dffrnt prtng systms (?Gst Cmptrs?), n  physcl cmptr wth  spcfc prtng systm (?Hst Cmptr?), t llw fr nstllng nd xctng ths Gst Cmptrs smltnsly. Th Prdct cnssts f xctbl fls n mchn cd fr th Slrs, Wndws, Lnx, nd Mc?OS?X prtng systms s wll s thr dt fls s rqrd by th xctbl fls t rn-tm nd dcmnttn n lctrnc frm. Th Prdct nclds ll dcmnttn nd pdts prvdd t Y by Sn ndr ths Agrmnt nd th trms f ths Agrmnt wll pply t ll sch dcmnttn nd pdts nlss  dffrnt lcns s prvdd wth n pdt r dcmnttn.


IndexedSequense additions to Traversable:
  1. scala> '!' +: license    
  2. res33: String = !Subject of Agreement
  3. scala> "hello" +: license
  4. res32: scala.collection.immutable.IndexedSeq[Any] = IndexedSeq(hello, S, u, b, j, e, c, t,  , o, f,  , A, g, r, e, e, m, e, n, t, .,  , ?, P, r, o, d, u, c, t, ?, ,,  , a, s,  , r, e, f, e, r, r, e, d,  , t, o,  , i, n,  , t, h, i, s,  , A, g, r, e, e, m, e, n, t, ,,  , s, h, a, l, l,  , b, e,  , t, h, e,  , b, i, n, a, r, y,  , s, o, f, t, w, a, r, e,  , p, a, c, k, a, g, e,  , ?, S, u, n,  , V, i, r, t, u, a, l, B, o, x, ,, ?,  , w, h, i, c, h,  , P, r, o, d, u, c, t,  , a, l, l
  5. scala> license(11)
  6. res35: Char = A
  7. scala> license diff ("aeiou")
  8. res47: String = Sbjct f Agreement. ?Product?, s referred to n this Agreement, shall be the binary software package ?Sun VirtualBox,? which Product allows for creating multiple virtual computers, each with different operating systems (?Guest Computers?), on a physical computer with a specific operating system (?Host Computer?), to allow for installing and executing these Guest Computers simultaneously. The Product consists of executable files in machine code for the Solaris, Windows, Linux, and Mac?OS?X operating systems as well as other data files as required by the executable files at run-time and documentation in electronic form. The Product includes all documentation and updates provided to You by Sun under this Agreement and the terms of this Agreement will apply to all such documentation and updates unless a different license is provided with an update or documentation.
  9. scala> license dropRight(30)
  10. res48: String = Subject of Agreement. ?Product?, as referred to in this Agreement, shall be the binary software package ?Sun VirtualBox,? which Product allows for creating multiple virtual computers, each with different operating systems (?Guest Computers?), on a physical computer with a specific operating system (?Host Computer?), to allow for installing and executing these Guest Computers simultaneously. The Product consists of executable files in machine code for the Solaris, Windows, Linux, and Mac?OS?X operating systems as well as other data files as required by the executable files at run-time and documentation in electronic form. The Product includes all documentation and updates provided to You by Sun under this Agreement and the terms of this Agreement will apply to all such documentation and updates unless a different license is provided wi
  11. scala> license takeRight(10)
  12. res49: String = mentation.
  13. scala> license indexWhere { "aeiou" contains _} 
  14. res4: Int = 1
  15. scala> license lastIndexWhere { "aeiou" contains _}
  16. res6: Int = 873
  17. scala> List(1,2,3) flatten {0 to _}
  18. res7: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3)
  19. scala> ' ' +: license zip license indexWhere {case (last, c) => "" + last + c == "an"}       
  20. res14: Int = 343
  21. scala> license indexOfSlice ("and")
  22. res17: Int = 342
  23. scala> ('a' to 'z') ++ ('A' to 'Z'
  24. res25: scala.collection.IndexedSeqView[Char,IndexedSeq[_]] = IndexedSeqViewA(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z)
  25. scala> license intersect res25
  26. res26: String = SubjectofAgrmnPdasihlywpkVBxvGCHTWLMOXqY
  27. scala> license diff res25     
  28. res27: String =   eeet. ?rouct?,  referred to n tis Agreement, shal be the binar softare acage ?Sun irtualo,? which Product allows for creating multiple irtual computers, each with different operating systems (?uest omputers?), on a physical computer with a specific operating system (?ost Computer?), to allow for installing and executing these Guest Computers simultaneously. he Product consists of executable files in machine code for the Solaris, indows, inux, and ac?S? operating systems as well as other data files as reuired by the executable files at run-time and documentation in electronic form. The Product includes all documentation and updates provided to ou by Sun under this Agreement and the terms of this Agreement will apply to all such documentation and updates unless a different license is provided with an update or documentation.
  29. scala> license filter res25
  30. < console>:7: error: type mismatch;
  31.  found   : scala.collection.IndexedSeqView[Char,IndexedSeq[_]]
  32.  required: (Char) => Boolean
  33.        license filter res25
  34.                       ^
  35. scala> license filter {res25 contains _}
  36. res29: String = SubjectofAgreementProductasreferredtointhisAgreementshallbethebinarysoftwarepackageSunVirtualBoxwhichProductallowsforcreatingmultiplevirtualcomputerseachwithdifferentoperatingsystemsGuestComputersonaphysicalcomputerwithaspecificoperatingsystemHostComputertoallowforinstallingandexecutingtheseGuestComputerssimultaneouslyTheProductconsistsofexecutablefilesinmachinecodefortheSolarisWindowsLinuxandMacOSXoperatingsystemsaswellasotherdatafilesasrequiredbytheexecutablefilesatruntimeanddocumentationinelectronicformTheProductincludesalldocumentationandupdatesprovidedtoYoubySununderthisAgreementandthetermsofthisAgreementwillapplytoallsuchdocumentationandupdatesunlessadifferentlicenseisprovidedwithanupdateordocumentation
  37. scala> license filterNot {res25 contains _}
  38. res30: String =   . ??,      ,       ? ,?        ,      (? ?),          (? ?),          .            , , ,  ??                -     .                                        .
  39. scala> license removeDuplicates
  40. res31: String = Subject ofAgrmn.?Pd,asihlywpkVBxv(GC)HTWLMOXq-Y
  41. scala> license segmentLength ({_ > 'A'},2) 
  42. res37: Int = 5
  43. scala> license drop 2 span {_ > 'A'} _1
  44. res41: String = bject
  45. scala> license sortWith Ordering.Char          
  46. res45: String =                                                                                                                                   (()),,,,,,,,,-....??????????AAAABCCCGGHLMOPPPPSSSSSTTVWXYaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbccccccccccccccccccccccccccccccddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeefffffffffffffffffggggggggggghhhhhhhhhhhhhhhhhhhhhiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiijkllllllllllllllllllllllllllllllllllmmmmmmmmmmmmmmmmmmmmmmnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnoooooooooooooooooooooooooooooooooooooooooooooooopppppppppppppppppppqrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrsssssssssssssssssssssssssssssssssssssssssssssssttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuvvvwwwwwwwwwwxxxxxyyyyyyyyy
  47. scala> 'a' to 'z' union ('A' to 'Z')
  48. scala> "hello world" patch (6, "goof", 5)
  49. res51: String = hello goof
  50. scala> "hello world" updated(0,'H')
  51. res54: String = Hello world

Thursday, November 12, 2009

Iterator.Sliding

This topic requires Scala 2.8 or higher.

Iterator and Iterable have most of the most useful methods when dealing with collections. Fold, Map, Filter are probably the most common. But other very useful methods include grouped/groupBy, sliding, find, forall, foreach, and many more. I want to cover Iterator's sliding method.

The def sliding[B >: A](size : Int, step : Int) : GroupedIterator[B] method provides a sliding window on the iterator. The first parameter is the size of the window and the second is the number of elements to skip each time the window is moved. If step == size then this method is the same as grouped. Step defaults to one

Examples:
  1. // notice resulting lists are all 3 long and increases by 1
  2. scala> (1 to 5).iterator.sliding(3).toList
  3. res18: List[Sequence[Int]] = List(List(1, 2, 3), List(2, 3, 4), List(3, 4, 5))
  4. // step is 3 so now there are only 2 groups
  5. scala> (1 to 5).iterator.sliding(4, 3).toList
  6. res19: List[Sequence[Int]] = List(List(1, 2, 3, 4), List(4, 5))
  7. // withPartial(false) makes the partial group be ignored
  8. scala> (1 to 5).iterator.sliding(4, 3).withPartial(false).toList
  9. res21: List[Sequence[Int]] = List(List(1, 2, 3, 4))
  10. scala> val it2 = Iterator.iterate(20)(_ + 5)
  11. it2: Iterator[Int] = non-empty iterator
  12. // withPadding fills in the partial group
  13. // notice with padding is by-name (it is executed each time it is called)
  14. scala> (1 to 5).iterator.sliding(4, 3).withPadding(it2.next).toList
  15. res23: List[Sequence[Int]] = List(List(1, 2, 3, 4), List(4, 5, 20, 25))

Tuesday, November 3, 2009

Non-strict (lazy) Collections

This topic discusses non-strict collections. There is a fair bit of confusion with regards to non-strict collections: what to call them and what are they. First lets clear up some definitions.

Pre Scala 2.8 collections have a projection method. This returns a non-strict collections. So often non-strict collections are called projections in Pre Scala 2.8
Scala 2.8+ the method was changed to view, so in Scala 2.8+ view sometimes refers to non-strict collections (and sometime to a implicit conversion of a class)
Another name for non-strict collections I have seen is "lazy collections."

All those labels are for the same thing "non-strict collections" which is the functional programming term and which I will use for the rest of this topic.

As an excellent addition to this topic please take a look at Strict Ranges? by Daniel Sobral.

One way to think of non-strict collections are pull collections. A programmer can essentially form a sequence of functions and the evaluation is only performed on request.

Note: I am intentionally adding side effects to the processes in order to demonstrate where processing takes place. In practice the collectionsshould be immutable (ideally) and the processing the collections really should be side-effect free. Otherwise almost guaranteed you will find yourself with a bug that is almost impossible to find.

Example of processing a strict collection:
  1. scala> var x=0
  2. x: Int = 0
  3. scala> def inc = {
  4.      | x += 1
  5.      | x
  6.      | }
  7. inc: Int
  8. scala> var list =  List(inc _, inc _, inc _)
  9. list: List[() => Int] = List(<function0><function0><function0>)
  10. scala> list.map (_()).head
  11. res0: Int = 1
  12. scala> list.map (_()).head
  13. res1: Int = 4
  14. scala> list.map (_()).head
  15. res2: Int = 7

Notice how each time the expression is called x is incremented 3 times. Once for each element in the list. This demonstrates that map is being called for every element of the list even though only head is being calculated. This is strict behaviour.

Example of processing a non-strict collection with Scala 2.7.5
For Scala 2.8 change project => view.
  1. scala> var x=0
  2. x: Int = 0
  3. scala> def inc = {
  4.      | x += 1
  5.      | x
  6.      | }
  7. inc: Int
  8. scala> var list =  List(inc _, inc _, inc _)
  9. list: List[() => Int] = List(<function0><function0><function0>)
  10. scala> list.projection.map (_()).head
  11. res0: Int = 1
  12. scala> list.projection.map (_()).head
  13. res1: Int = 2
  14. scala> list.projection.map (_()).head
  15. res2: Int = 3
  16. scala> list.projection.map (_()).head
  17. res3: Int = 4

Here you can see that only one element in the list is being calculated for the head request. That is the idea behind non-strict collections and can be useful when dealing with large collections and very expensive operations. This also demonstrates why side-effects are so crazy dangerous!.

More examples (Scala 2.8):
  1. scala> var x=0
  2. x: Int = 0
  3. scala> def inc = { x +=1; x }
  4. inc: Int
  5. // strict processing of a range and obtain the 6th element
  6. // this will run inc for every element in the range
  7. scala> (1 to 10).map( _ + inc).apply(5)
  8. res2: Int = 12
  9. scala> x
  10. res3: Int = 10
  11. // reset for comparison
  12. scala> x = 0
  13. x: Int = 0
  14. // now non-strict processing but the same process
  15. // you get a different answer because only one
  16. // element is calculated
  17. scala> (1 to 10).view.map( _ + inc).apply(5)
  18. res6: Int = 7
  19. // verify that x was incremented only once
  20. scala> x
  21. res7: Int = 1
  22. // reset for comparison
  23. scala> x = 0
  24. x: Int = 0
  25. // force forces strict processing
  26. // now we have the same answer as if we did not use view
  27. scala> (1 to 10).view.map( _ + inc).force.apply(5)
  28. res9: Int = 12
  29. scala> x
  30. res10: Int = 10
  31. // reset for comparison
  32. scala> x = 0
  33. x: Int = 0
  34. // first 5 elements are computed only
  35. scala> (1 to 10).view.map( _ + inc).take(5).mkString(",")
  36. res9: String = 2,4,6,8,10
  37. scala> x
  38. res10: Int = 5
  39. // reset for comparison
  40. scala> x = 0
  41. x: Int = 0
  42. // only first two elements are computed
  43. scala> (1 to 10).view.map( _ + inc).takeWhile( _ < 5).mkString(",")
  44. res11: String = 5,7
  45. scala> x
  46. res12: Int = 5
  47. // reset for comparison
  48. scala> x = 0
  49. x: Int = 0
  50. // inc is called 2 for each element but only the last 5 elements are computed so
  51. // x only == 10 not 20
  52. scala> (1 to 10).view.map( _ + inc).map( i => inc ).drop(5).mkString(",")
  53. res16: String = 2,4,6,8,10
  54. scala> x
  55. res17: Int = 10
  56. scala> x = 0                                             
  57. x: Int = 0
  58. // define this for-comprehension in a method so that
  59. // the repl doesn't call toString on the result value and
  60. // as a result force the full list to be processed
  61. scala> def add = for( i <- (1 to 10).view ) yield i + inc
  62. add: scala.collection.IndexedSeqView[Int,IndexedSeq[_]]
  63. scala> add.head                                          
  64. res5: Int = 2
  65. // for-comprehensions will also be non-strict if the generator is non-strict
  66. scala> x
  67. res6: Int = 1

Monday, November 2, 2009

Scala 2.8 Collections Overview

This topic introduces the basics of the collections API introduced in Scala 2.8. It is NOT intended to teach everything one needs to know about the collections API. It is intended only to provide context for future discussions about the new features of 2.8 collections. For example I am planning on how to introduce custom collection builders.

Disclaimer: I am not an expert on the changes to 2.8. But I think it is important to have a basic awareness of the changes. So I am going to try to provide a simplified explanation. Corrections are most certainly welcome. I will incorporate them into the post as soon as I get them.

If you want more information you can view the improvement document: Collections SID.

Pre Scala 2.8 there were several criticisms about the inconsistencies on the collections API.

Here is a Scala 2.7 example:
  1. scala> "Hello" filter ('a' to 'z' contains _)
  2. res0: Seq[Char] = ArrayBuffer(e, l, l, o)
  3. scala> res0.mkString("")
  4. res1: String = ello

Notice how after the map function the return value is a ArrayBuffer, not a String. The mkString function is required to convert the collection back to a String.
Here is the code in Scala28:
  1. scala> "Hello" filter ('a' to 'z' contains _)         
  2. res2: String = ello

In Scala28 it recognizes that the original collection is a String and therefore a String should be returned. This sort of issue was endemic in Scala 2.7 and several classes had hacks to work around the issue. For example List overrode the implementations for many of its methods so a List would be returned. But other classes, like Map, did not. So depending on what class you were using you would have to convert back to the original collection type. Maps are another good example:
  1. scala> Map(1 -> "one",                                         
  2.      |     2 -> "two") map {case (key, value) => (key+1,value)}
  3. res3: Iterable[(Int, java.lang.String)] = ArrayBuffer((2,one), (3,two))

But if you use filter in map then you got a Map back.

I don't want to go into detail how this works, but basically each collection is associated with a Builder which is used to construct new instances of the collection. So the superclasses can perform the logic and simply delegate the construction of the classes to the Builder. This has the effect of both reducing the amount of code duplication in the Scala code base (not so important to the API) but most importantly making the return values of the various methods consistent.

It does make the API slightly more complex.

So where in pre Scala 2.8 there was a hierarchy (simplified):
                    Iterable[A]
|
Collection[A]
|
Seq[A]
In 2.8 we now have:
               TraversableLike[A,Repr]   
/ \
IterableLike[A,Repr] Traversable[A]
/ \ /
SeqLike[A,Repr] Iterable[A]
\ /
Seq[A]
The *Like classes have most of the implementation code and are used to compose the public facing API: Seq, List, Array, Iterable, etc... From what I understand these traits are publicly available to assist in creating new collection implementations (and so the Scaladocs reflect the actual structure of the code.)

Friday, October 30, 2009

Groupby - collection processing

Iterator and Iterable have most of the most useful methods when dealing with collections. Fold, Map, Filter are probably the most common. But other very useful methods include grouped/groupBy, sliding, find, forall, foreach, and many more. I want to cover Iterable's groupBy method in this topic.

This is a Scala 2.8 and later method. It is similar to partition in that it allows the collection to be divided (or partitioned). Partition takes a method with returns a boolean and partitions the collection into two depending on a result. GroupBy takes a function that returns an object and returns a Map with the key being the return value. This allows an arbitrary number of partitions to be made from the collection.

Here is the method signature:
  1. def groupBy[K](f : (A) => K) : Map[K, This]

A bit of context is require to understand the three Type parameters A, K and This. This method is defined in a super class of collections called TraversableLike (I will briefly discuss this in the next topic.) TraversableLike takes two type parameters: the type of the collection and the type contained in the collection. Therefore in this method definition, 'This' refers to the collection type (List for example) and A refers to contained type (perhaps Int). Finally K refers to the type returned by the function and are the keys of the groups formed by the method.

Examples:
  1. scala> val groups = (1 to 20).toList groupBy {
  2.      | case i if(i<5) => "g1"
  3.      | case i if(i<10) => "g2"
  4.      | case i if(i<15) => "g3"
  5.      | case _ => "g4"
  6.      | }
  7. res4: scala.collection.Map[java.lang.String,List[Int]] = Map(g1 -> List(1, 2, 3, 4), g2 -> List(5, 6, 7, 8, 9), g3 -> List(10, 11, 12, 13, 14), g4 -> List(15, 16, 17, 18, 19, 20))
  8. scala> groups.keySet
  9. res6: scala.collection.Set[java.lang.String] = Set(g1, g2, g3, g4)
  10. scala> groups("g1")
  11. res7: List[Int] = List(1, 2, 3, 4)
  12. scala> val mods = (1 to 20).toList groupBy ( _ % 4 )
  13. mods: scala.collection.Map[Int,List[Int]] = Map(1 -> List(1, 5, 9, 13, 17), 2 -> List(2, 6, 10, 14, 18), 3 -> List(3, 7,
  14.  11, 15, 19), 0 -> List(4, 8, 12, 16, 20))
  15. scala> mods.keySet
  16. res9: scala.collection.Set[Int] = Set(1, 2, 3, 0)
  17. scala> mods(1)
  18. res11: List[Int] = List(1, 5, 9, 13, 17)