Daily scala: Scala-IO Core: Long Traversable

The LongTraversable trait is one of the most important objects in Scala IO. Input provides a uniform way of creating views on the data (as a string or byte array or LongTraversable of something like bytes.)

LongTraversable is a scala.collection.Traversable with some extra capabilities. A few of the salient points of LongTraversable are:

It is a lazy/non-strict collection similar to Stream. In other words, you can perform operations like map, flatmap, filter, collect, etc... without accessing the resource
Methods like slice and drop will (if possible for the resource) skip the dropped bytes without reading them
Each usage of the LongTraversable will typically open and close the underlying resource.
Has methods that one typically finds in Seq. For example: zip, apply, containsSlice
Has methods that take or return Longs instead of Ints like ldrop, lslice, ltake, lsize
Has limitFold method that allows fold like behaviour with extra features like skip and early termination
Can be converted to an AsyncLongTraversable which has methods that return Futures instead and won't block the program
Can be converted to a Process object for advanced data processing pipelines

Example usage:

import scalax.io._
import java.net.URL
 
val file1 = Resource.fromURL(new URL("http://www.scala-lang.org"))
val file2 = Resource.fromURL(new URL("http://www.camptocamp.com"))
 
// scala-io versions > 0.4.1 will have method
//   Resource.fromURLString("http://xyz.com")
// but earlier versions use an overloaded method: 
//   Resource.fromURL("http://www.scala-lang.org")
 
// A simple example of comparing all bytes in
// one file with those of another.
// combining zip with sliding is a good way to perform operations
// on sections of two (or more) files
val zipped = file1.bytes.zip(file2.bytes) map {
  case (file1Byte, file2Byte) =>
    file2Byte < file1Byte
}
 
// take the first 5 results and load them into memory
val fiveBytes = zipped.take(5).force
 
// for debug in REPL lets print them out
fiveBytes mkString ","
 
// Add a line number to each line in a file
//
// Note:  Since methods in a Input object return LongTraversableView objects
// all zip examples do not open the file.  To do that you must call
// force or some other method that forces a read to take place.
val addedLineNumbers = file1.lines().zipWithIndex.map {
  case (line,idx) => idx+" "+line
}
// print out second group of 5 lines
 
addedLineNumbers.drop(5).take(5) foreach println
 
// check if file 1 startsWith file 2
file1.bytes.startsWith(file2.bytes)
 
// The number of consecutive lines starting at 0 containing <
file1.lines().segmentLength(_ contains "<",0)
 
// check if all lines in file1 are the same as in file2 ignoring case
file1.lines().corresponds(file2.lines())(_ equalsIgnoreCase _)
 
// Check if file1has the same bytes as file2
file1.bytes.sameElements(file2.bytes)
 
// silly example but shows that value
// being compared can be any traversable
file1.bytes.sameElements(1 to 30)
 
// use sliding to visit each 1008 bytes.
// map splits the window into two parts, block and checksum
val blocks = file1.bytes.sliding(1008,1008).map{_ splitAt 1000}
 
// grouped is sliding(size,size) so the following is equivalent
val blocks2 = file1.bytes.grouped(1008).map{_ splitAt 1000}
 
blocks2 foreach {
  case (block,checksum) =>
    // verify checksum and process
    println(block take 5)
}

The limitFold method can be quite useful to process only a portion of the file if you don't know ahead of time what the indices of the portion are:

import scalax.io._
import java.net.URL
 
val in:Input = Resource.fromURL(new URL("http://www.camptocamp.com"))
 
/**
 * Skip first 10 bytes and sum a random number of bytes up
 * to 20 bytes
 */
in.bytes.drop(10).take(20).limitFold(10) {
  case (acc, next) if util.Random.nextBoolean => End(acc + next)
  case (acc, next) => Continue(acc + next)
}

Daily scala

Monday, August 6, 2012

Scala-IO Core: Long Traversable

1 comment: