Friday, October 5, 2012

Scala-IO Core: Unmanaged Resources

The main design of Scala-IO is around automatic closing of resources each time a resource is accessed in order to ensure that a programmer cannot unintentionally leave resources open in the face of exceptions or other unexpected situations. However, there are cases where the Scala-IO API is desired but the resource management is undesired. The classic case is of reading or writing to System.in and out. Thus Unmanaged resources exist to satisfy this use-case. 

Since unmanaged resources is a less common use-case there is not a factory object like there is for normal managed Resources.  Instead certain objects can be converted to unmanaged resources using the JavaConverters implicit methods as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
// JavaConverters implicit methods are required to create the
// unmanaged resources
scala> import scalax.io.JavaConverters._
import scalax.io.JavaConverters._
 
// now that JavaConverters are in scope we can convert
// System.out to a Output object and use it as any normal
// output object except that the stream will not be closed
// WriteableByteChannels can also be converted
scala> System.out.asUnmanagedOutput.write("Not closing right?")
Not closing right?
 
// See, still not closed
scala> System.out.asUnmanagedOutput.write("Still not closed?")
Still not closed?
 
scala> import java.io._
import java.io._
 
// another demonstration of converting an output stream to a
// unmanaged Output object.  This is frowned upon unless
// unavoidable.
scala> val fout = new FileOutputStream("somefile.txt")
fout: java.io.FileOutputStream = java.io.FileOutputStream@23987721
 
scala> val foutOutput = fout.asUnmanagedOutput
foutOutput: scalax.io.Output = scalax.io.unmanaged.WritableByteChannelResource@36b54a77
 
scala> foutOutput.write("Hello ")
 
scala> foutOutput.write("World!")
 
scala> fout.close
 
// see the output object is broken now because the stream is closed
scala> foutOutput.write("boom")
No Main Exception
---
class java.nio.channels.ClosedChannelException(null)
...
 
// The mirror image is converting an input stream to an Input object
scala> val fin = new FileInputStream("somefile.txt")
fin: java.io.FileInputStream = java.io.FileInputStream@4fcbc4de
 
scala> val chars = fin.asUnmanagedInput.chars
chars: scalax.io.LongTraversable[Char] = LongTraversable(...)
 
// normally a LongTraversable will close the resource
// but this LongTraversable is obtained from a unmanagedInput
// so can be used multiple times without closing the resource
scala> chars.head
res19: Char = H
 
scala> chars.head
res20: Char = e
 
scala> chars.head
res21: Char = l
 
scala> chars.head
res22: Char = l
 
// don't forget to close
scala> fin.close
 
// quick demo of using channels
// the following is a major anti-pattern and is
// here mainly for completeness
scala> val fchannel = new RandomAccessFile("somefile.txt", "rw").getChannel
fchannel: java.nio.channels.FileChannel = sun.nio.ch.FileChannelImpl@1e10cb60
 
scala> val fInput2 = fchannel.asUnmanagedInput
fInput2: scalax.io.Input = scalax.io.unmanaged.ReadableByteChannelResource@679a339e
 
scala> println(fInput2.string)
Hello World!
 
scala> fchannel.isOpen
res13: Boolean = true
 
scala> val fOutput = fchannel.asUnmanagedOutput
fOutput: scalax.io.Output = scalax.io.unmanaged.WritableByteChannelResource@9cc8b91
 
scala> fOutput.write("hi there")
 
scala> println(fInput2.string)
hi thererld!
 
// don't forget to close
scala> fchannel.close

Wednesday, September 26, 2012

Scala-IO Core: To Resource Converters

In order to simplify integration with existing libraries, most commonly Java libraries, Scala-IO provides a JavaConverters object with implicit methods that add as*** methods (asInput, asOutput, asSeekable, etc...) to several types of objects.  It is the same pattern as in the scala.collection.JavaConverters object.

These methods can be used instead of the Resource.from*** methods to provide a slightly nicer appearing code.

There is one warning. When using JavaConverters, instead of Resource.from*** for creating Input/Output/Seekable/etc... objects, the chances of falling into the trap of creating non-reusable resources or causing a resource leak is increased. See: scala-io-core-reusable-resources for more details on this.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// JavaConverters object contains the implicit methods
// which add the as*** methods to the applicable _normal_ objects
import scalax.io.JavaConverters._
 
// Several objects can be converted to input objects using
// asInput.  URLs, File, RandomAccessFile, InputStream, Traversable[Byte]
// Array[Byte], ReadableByteChannel
val input = new URL("http://www.camptocamp.com").asInput
 
// simple demonstation using the newly created input object
println(input.bytes.size)
 
// asSeekable can only be applied to a few objects at the
// moment.  Including RandomAccessFile, SeekableByteChannel,
// File and perhaps in future mutable Sequences
val file = new java.io.File("somefile.txt").asSeekable
 
// demonstrate a seekable method.  This method
// ensures the file is empty
file.truncate(0)
 
// write hi using the Output  API
file.write("hi :)")
 
// output the file to the console to see the results
println(file.string)
 
// asUnmanaged*** created Unmanaged resources for
// operations.  The Unmanaged Resource post will
// discuss this in more detail but essentially the
// resource is not closed and thus is useful when
// dealing with underlying objects that should not
// be closed like System.out
val unmanagedOutput = System.out.asUnmanagedOutput
 
// prints to standard out: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
unmanagedOutput.writeStrings(1 to 10 map (_.toString), ", ")
 
// prints: Hello World
// This demonstrates how a simple Array or Traversable can be easily used
// as an Input object.  The array is Hello world encoded as normal latin1
println(Array(72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100).asInput.string)
 
// prints: Hll Wrld
// Raw Strings and Reader objects cannot currently be converted to an Input object
// but can be converted to ReadChars object (Essentially the character API of Input)
println("Hello World".asReadChars.chars.filterNot(c => "aeiou" contains c) mkString)
 
// Similarly Writer objects cannot be Output objects since they can't write
// bytes so asWriteChars is used to be able to have the string writing
// capabilities of Scala-IO
new java.io.FileWriter("somefile.txt").asWriteChars write "Hi"
 
// prints: Hi
println(new java.io.FileReader("somefile.txt").asReadChars.string)

Wednesday, September 19, 2012

Scala-IO Core: Reusable Resources


One aspect of resources in Scala-IO that can cause problems is the construction of resource objects.  The factory methods that are provided in the Resource object each have a lazy parameter for opening the underlying resource.  However a common error developers can make is to pass in the already open resource to the method which has multiple problems.

Consider:
1
2
3
4
5
6
7
8
9
import scalax.io._
   
val stream = new java.io.FileOutputStream("somefile.txt")
val output = Resource.fromOutputStream(stream) 
output write "hey "
 
// boom, stream was closed last write
// and the stream cannot be reopened.
output write "how's it going?"
In the example above the stream is created and opened at the definition of stream (it is a val).  This has two effects:

  1. the stream is open and if the resource object is not closed you will have a resource leak
  2. since the stream is opened the resource can only be used once since it will be closed after each use.
The correct way to create the resource would be to change val to def so that the stream is only created on demand and therefore there will be no chance of a resource leak.  The following is the correct example:
1
2
3
4
5
6
7
8
9
10
11
12
import scalax.io._
 
def stream = new java.io.FileOutputStream("somefile.txt")
val output = Resource.fromOutputStream(stream)
 
output write "hey "
// the second write will now work.  However since
// the underlying resource is a FileOutputStream
// the file will contain just "how's it going"
output write "how's it going?"
 
println(Resource.fromFile("somefile.txt").string)

This anti-pattern is also a risk when using the converter methods in the JavaConverters object. (A future post will look into this in more detail.) The following example shows the anti-pattern in effect:
1
2
3
4
5
6
7
8
9
import scalax.io._
import JavaConverters._
 
val output = new java.io.FileOutputStream("somefile.txt").asOutput
 
output write "hey "
// Next line will cause exception because
// stream is closed.
output write "how's it going?"
The asOutput method can only be applied to an object (at time of this writing) and therefore the resulting object has all of the negative characteristics mentioned above. Therefore it is recommended that asOutput/asInput/etc... only be used on 1 time use resources (like InputStream) within a scope and not passed out to an external method so that it is easy to view the entirety of the operation.

Thursday, September 13, 2012

Scala-IO Core: ReadChars and WriteChars

The Input and Output objects of Scala-IO assume that the underlying data is composed of bytes.  However, another common pattern is to have the underlying data be composed of characters instead of bytes, for example java.io.Reader and java.io.Writer.  While it is possible to decompose the output into Bytes and construct an Input object from the decorated object, ReadChars and WriteChars can be used in this situation to reduce the work needed to interact with such resources.

ReadChars and WriteChars are traits that contain the character and string methods of Input and Output.  The primary difference is that the Charset is defined by the underlying resource rather than supplied at the method invocation site.  

Compare two methods:

Input:
1
def chars(implicit codec: Codec = Codec.default): LongTraversable[Char]
ReadChars:
1
def chars: LongTraversable[Char]
You will notice that the ReadChars method does not have the codec parameter because there translation is not required, unlike in Input which requires the characters to be created from raw bytes.

Not many examples are needed to explain these concepts but here are a few examples on how to create ReadChar and WriteChar objects:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import scalax.io._
import JavaConverters._
 
// JavaConverters has asReadChars and asWriteChars
// for converting some objects to ReadChars and WriteChars
// (The JavaConverters post will explain
// more about the JavaConverters object)
"Hello World".asReadChars.chars.size
 
val writer = new java.io.StringWriter()
// Resource object can be used to create a ReadChars and WriteChars
val writeChars = Resource.fromWriter(writer).write("Yeee Hawww!")
println(Resource.fromReader(new java.io.StringReader(writer.toString)).lines().size)
 
// ReadChars and WriteChars can be obtained from
// InputResource and OutputResource object respectively
// SeekableByteChannelResource[SeekableByteChannel]
// (returned by fromFile) implements both traits and
// therefore has both methods
val fileResource = Resource.fromFile("somefile.txt")
 
implicit val codec = Codec.UTF8
// clear any old data in file
fileResource.truncate(0)
 
// Codec used is declared when calling writer or reader
// methods
fileResource.writer.write("hi")
 
println(fileResource.reader.chars.size)

Thursday, August 30, 2012

On Vacation

I am getting a lot of emails about Scala-IO and my posts.  Just want to let everyone know I am on vacation until September 10th or so.  I have some posts in the works but they won't be done here where I have virtually no internet.

Back soon.

Sunday, August 19, 2012

Scala-IO Core: Seekable

At the same level of abstraction as Input and Output is the fine trait called Seekable.  As the name implies it provides random access style methods for interacting with a resource.  The example that comes immediately to mind is a random access file.

The design of Seekable largely mimics the scala.collection.Seq patch and insert methods.  Not much more to say beyond getting into some examples:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
import scalax.io._
import java.io.File
 
// For this example lets explicitly specify a codec
implicit val codec = scalax.io.Codec.UTF8
 
val file: Seekable =  Resource.fromFile(new File("scala-io.out"))
 
// delete all data from the file
// or more specifically only keep the data from the beginning of the
// file up to (but not including) the first byte
file truncate 0
 
val seedData = "Entering some seed data into the file"
file write seedData
 
// verify seed data was correctly written (in REPL)
file.string
 
 
// write "first" after character 9
// if the file is < 9 characters an underflow exception is thrown
// if the patch extends past the end of the file then the file is extended
// Note: The offset is always dependent on type of data being written
// for example if the data written is a string it will be 9 characters
// if the data is bytes it will be 9 bytes
file patch (9, "first",OverwriteAll)
 
// dumping to REPL will show: Entering firstseed data into the file
file.string
 
// patch at position 0 and replace 100 bytes with new data
file.patch(0,seedData, OverwriteSome(100))
 
// dumping to REPL will show the unchanged seedData once again
file.string
 
// Overwrite only 4 bytes starting at bytes 9.
// the extra bytes will be inserted
// In other words the "some" word of seed data will
// be replaced with second
// Warning: This is an overwrite and an insert
// inserts are expensive since it requires copying the data from
// the index to end of file.  If small enough it is done in
// memory but a temporary file is required for big files.
file.patch(9,"second".getBytes(), OverwriteSome(4))
 
// dumping to REPL will show: Entering second seed data into the file
file.string
 
// reset file
file.patch(0,seedData, OverwriteSome(100))
 
// Replace 9 bytes with the 5 bytes that are provided
// In other words: replace "some seed" with "third"
file.patch(9,"third".getBytes(), OverwriteSome(9))
 
// dumping to REPL will show: Entering third data into the file
file.string
 
// reset file
file.patch(0,seedData, OverwriteSome(100))
 
// Insert a string at start of file
file.insert(0, "newInsertedData ")
 
// dumping to REPL will show: newInsertedData Entering some seed data into the file
file.string
 
// reset file
file.patch(0,seedData, OverwriteSome(100))
 
//add !! to end of file
file.append("!!")
 
// dumping to REPL will show: Entering some seed data into the file!!
file.string
IMPORTANT: Each time truncate() or patch or insert is called a new connection to the file is opened and closed. The Processor API is to be used to perform multiple operations within one connection.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import scalax.io._
import java.io.File
 
val file: Seekable =  Resource.fromFile(new File("scala-io.out"))
 
for {
  p <- file.seekableProcessor
  seekable = p.asSeekable
} {
  seekable truncate 0
  seekable write "hi"
  seekable append " world"
 
  // one can do patch, insert etc...
  // or move the cursor to the correct position and write
  // this is essentially a patch(0, "Hi", OverwriteAll)
  seekable.position = 0
  seekable write "Hi"
 
  seekable patch (3, "W", OverwriteAll)
 
  // dumping to REPL will show: Hi World
  println (seekable.string)
}

Tuesday, August 14, 2012

Scala-IO Core: Output - OutputConverter

As mentioned in the last post on Output, it is possible to write arbitrary objects to an output object and have it serialized to disk.

The way this is handled in Scala-IO is via OutputConverters.  If you are familiar with the type-class pattern then this should be very clear to you how this works.  For a very quick introduction you can read: http://www.sidewayscoding.com/2011/01/introduction-to-type-classes-in-scala.html.

The clue is in the signature of write:
1
def write[T](data: T)(implicit writer: OutputConverter[T]): Unit

the last parameter is the object that defines how the object is serialized.  The OutputConverter trait essentially converts and object into bytes and has a few built-in implementations in its companion object for objects like Int, Float, Byte, Char, etc... 

Since the parameter is implicit the compiler will search for an implementation that satisfies the requirements (that the OutputConverter has the type parameter T).  This allows:
1
2
3
4
5
6
7
8
9
10
11
12
import scalax.io._
 
val output:Output = Resource.fromFile("scala-io.out")
 
output write 3
 
// and
 
output write Seq(1,2,3)
 
// one can be more explicit and declare the OutputConverter
output.write(3)(OutputConverter.IntConverter)
The last line in the example shows the explicit declaration of the OutputConverter to use when writing the data. This indicates how one can provide their own converter.

Since the parameter is implicit there are two ways that custom OutputConverters can be used.
  • defining an implicit object for the object to be written. In this case all the possible ways implicits can be defined can be used. For example as an implicit value or in the companion object of the object to be written (serialized)
  • Explicitly declare the converter to use at the method call site

First let's examine the use-case where the object is from a different library and therefore we cannot create a companion object for the object.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import java.util.Date
import scalax.io._
import Resource._
import OutputConverter._
val file = fromFile("scala-io.out")
 
// Simplest design pattern is to create a new implicit object in scope
implicit object DateConverter extends OutputConverter[Date] {
  def sizeInBytes = 8
  def toBytes(data: Date) = LongConverter.toBytes(data.getTime)
}
 
file.write(java.util.Calendar.getInstance().getTime())
 
// display result in REPL
file.byteArray
 
// naturally the write method can have the converter
// explicitly declared if you don't want to make the
// object implicit
file.write(java.util.Calendar.getInstance().getTime())(DateConverter)
The second case is where you are implementing the class and therefore can add a companion object:
For this next bit to work you need to paste it into a file and run that or use the paste mechanism of the REPL (type :paste into repl and press enter)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import scalax.io._
import Resource._
import OutputConverter._
 
class MyData(val name:String)
object MyData {
  implicit object Converter extends OutputConverter[MyData] {
    def sizeInBytes = 1
    def toBytes(data: MyData) = data.name.getBytes("UTF8")
  }
}
val file = fromFile("scala-io.out")
 
// lets quickly delete file to make sure we are dealing with
// and empty file (this is a method on Seekable)
file.truncate(0)
 
file write (new MyData("jesse"))
 
// display result in REPL
file.string

Wednesday, August 8, 2012

Scala-IO Core: Output

The Output object is the primary trait for writing data to a resource. The basic usage is very simple but can get more complex when one wishes to serialize objects.

Lets start with the basic usage:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import scalax.io._
import scalax.io.Resource
 
val out:Output = Resource.fromOutputStream(new java.io.FileOutputStream("daily-scala.out"))
val in:Input = Resource.fromFile("daily-scala.out")
 
// Write some bytes to the output object
// each write will typically overwrite
//the previous data the processing API
// can be used is you do not want this behaviour
out write "data".getBytes()
out write Array[Byte](1,2,3)
 
// print out file in REPL
in.byteArray
 
// Writing strings need a Codec
// for encoding the strings.  The default is UTF8
// but the default is easily overridden.
out write "howdy"
 
// printout file in REPL
in.string
 
out.write("howdy")(Codec.UTF8)
 
// printout file in REPL
in.string
 
implicit val defaultCodec: Codec = Codec.UTF8
 
// The implicit code will be used instead of the
// default codec
out write "hi there"
 
// printout file in REPL
in.string
 
// write all strings in a collection with default separator ("")
out writeStrings Seq("it","was","a","dark","and","stormy","night")
 
// printout file in REPL
in.string
 
// write all strings in sequence with a space as the separator
out.writeStrings(Seq("it","was","a","dark","and","stormy","night"), " ")
A common need is to write several times to a single Output without overwriting the data. To do this one can use the processing API. A future post(s) will look at the processing API in more detail but for now a simple example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import scalax.io._
 
val output:Output = Resource.fromOutputStream(new java.io.FileOutputStream("daily-scala.out"))
val in:Input = Resource.fromFile("daily-scala.out")
 
// Output processor are used when one
// needs to perform batch writes on an
// output object When a processor object
// is used a "processing" pipeline is
// created and the operations are performed
// in batch form.
 
// The following example will write 2 lines
// to the output object (a file in this case)
// there are a few ways to use outputProcessors. 
 
// This example is the pattern most developers will
// likely be most comfortable with:
for{
  // create a processor (signalling the start of a batch process)
  processor <- output.outputProcessor
  // create an output object from it
  out = processor.asOutput
}{
  // all writes to out will be on the same open output stream/channel
  out.write("first write\n")
  out.write("second write")
}
 
// show data in REPL
in.string
 
// As will be shown in the future, a processor is typically lazy
// if created with map and flatmap calls.
// The next example is another way to do the multiple writes.
 
// first create processor
val processor = for{
    // create the processor
    out <- output.outputProcessor
    // perform write calls
    _ <- out.write("second time first write\n")
    _ <- out.write("second time second write")
} yield {}
// at this point the writes have not occurred because
// processor contains the processing pipeline
 
// show data in REPL
in.string
 
processor.execute  // execute processor
 
// show data in REPL
in.string

Monday, August 6, 2012

Scala-IO Core: Long Traversable

The LongTraversable trait is one of the most important objects in Scala IO. Input provides a uniform way of creating views on the data (as a string or byte array or LongTraversable of something like bytes.)

LongTraversable is a scala.collection.Traversable with some extra capabilities. A few of the salient points of LongTraversable are:
  • It is a lazy/non-strict collection similar to Stream. In other words, you can perform operations like map, flatmap, filter, collect, etc... without accessing the resource
  • Methods like slice and drop will (if possible for the resource) skip the dropped bytes without reading them
  • Each usage of the LongTraversable will typically open and close the underlying resource.
  • Has methods that one typically finds in Seq.  For example: zip, apply, containsSlice
  • Has methods that take or return Longs instead of Ints like ldrop, lslice, ltake, lsize
  • Has limitFold method that allows fold like behaviour with extra features like skip and early termination
  • Can be converted to an AsyncLongTraversable which has methods that return Futures instead and won't block the program
  • Can be converted to a Process object for advanced data processing pipelines
Example usage:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import scalax.io._
import java.net.URL
 
val file1 = Resource.fromURL(new URL("http://www.scala-lang.org"))
val file2 = Resource.fromURL(new URL("http://www.camptocamp.com"))
 
// scala-io versions > 0.4.1 will have method
//   Resource.fromURLString("http://xyz.com")
// but earlier versions use an overloaded method:
//   Resource.fromURL("http://www.scala-lang.org")
 
// A simple example of comparing all bytes in
// one file with those of another.
// combining zip with sliding is a good way to perform operations
// on sections of two (or more) files
val zipped = file1.bytes.zip(file2.bytes) map {
  case (file1Byte, file2Byte) =>
    file2Byte < file1Byte
}
 
// take the first 5 results and load them into memory
val fiveBytes = zipped.take(5).force
 
// for debug in REPL lets print them out
fiveBytes mkString ","
 
// Add a line number to each line in a file
//
// Note:  Since methods in a Input object return LongTraversableView objects
// all zip examples do not open the file.  To do that you must call
// force or some other method that forces a read to take place.
val addedLineNumbers = file1.lines().zipWithIndex.map {
  case (line,idx) => idx+" "+line
}
// print out second group of 5 lines
 
addedLineNumbers.drop(5).take(5) foreach println
 
// check if file 1 startsWith file 2
file1.bytes.startsWith(file2.bytes)
 
// The number of consecutive lines starting at 0 containing <
file1.lines().segmentLength(_ contains "<",0)
 
// check if all lines in file1 are the same as in file2 ignoring case
file1.lines().corresponds(file2.lines())(_ equalsIgnoreCase _)
 
// Check if file1has the same bytes as file2
file1.bytes.sameElements(file2.bytes)
 
// silly example but shows that value
// being compared can be any traversable
file1.bytes.sameElements(1 to 30)
 
// use sliding to visit each 1008 bytes.
// map splits the window into two parts, block and checksum
val blocks = file1.bytes.sliding(1008,1008).map{_ splitAt 1000}
 
// grouped is sliding(size,size) so the following is equivalent
val blocks2 = file1.bytes.grouped(1008).map{_ splitAt 1000}
 
blocks2 foreach {
  case (block,checksum) =>
    // verify checksum and process
    println(block take 5)
}

The limitFold method can be quite useful to process only a portion of the file if you don't know ahead of time what the indices of the portion are:
1
2
3
4
5
6
7
8
9
10
11
12
13
import scalax.io._
import java.net.URL
 
val in:Input = Resource.fromURL(new URL("http://www.camptocamp.com"))
 
/**
 * Skip first 10 bytes and sum a random number of bytes up
 * to 20 bytes
 */
in.bytes.drop(10).take(20).limitFold(10) {
  case (acc, next) if util.Random.nextBoolean => End(acc + next)
  case (acc, next) => Continue(acc + next)
}

Thursday, August 2, 2012

Scala-IO Core: Resource, Input

Just a note: all these examples have been tested in REPL so go ahead and fire up the sbt console in the example project and try these out.

Resource


Resource is the fundamental component of Scala-IO. A Resource is essentially anything that has a simple open/close lifecycle. The Resource trait handles the lifecycle for the developer allowing him to focus on the IO logic.

In the typical use-case one of the Resource subclasses will be used. They are more useful in general because they will have one of higher level traits mixed in like Input or Output.

The most typical way to create a Resource is with the Resource object which is a factory method for creating Resource objects from various types of Java objects.

While Resource is the foundation Trait, Input and Output are the Traits most commonly used, The user-facing traits if you will.

Here are a few examples of creating Resources:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import scalax.io.Resource
 
Resource.fromURL("http://www.camptocamp.com")
Resource.fromURL(new java.net.URL("http://www.camptocamp.com"))
 
Resource.fromFile("file")
Resource.fromFile(new java.io.File("file"))
 
Resource.fromRandomAccessFile(new java.io.RandomAccessFile("file", "rw"))
 
Resource.fromByteChannel(new java.io.FileInputStream("file").getChannel)
Resource.fromReadableByteChannel(new java.io.FileInputStream("file").getChannel)
Resource.fromWritableByteChannel(new java.io.FileInputStream("file").getChannel)
 
Resource.fromInputStream(new java.io.FileInputStream("file"))
Resource.fromOutputStream(new java.io.FileOutputStream("file"))
 
Resource.fromReader(new java.io.FileReader("file"))
Resource.fromWriter(new java.io.FileWriter("file"))
 
Resource.fromClasspath("scalax/io/Resource.class")
Resource.fromClasspath("scalax/io/Resource.class", classOf[Resource[_]])
There are advanced usages of Resource that we will get into in later posts. At the moment I want to focus on Input, Output and Seekable Traits. In later posts we will look at how to integrate with legacy Java APIs and how to access the underlying resource using the loan pattern.

Input


The Input Trait provides methods for accessing the data of the underlying resource in various different way. As bytes, strings, lines, etc...

There are two basic types of methods. Methods that return LongTraversable objects and methods that load the entire Resource into memory. For example: string and byteArray load the entire resource into memory while bytes and chars return a LongTraversable.

What is a LongTraversable? That will be the next post :-). Summarized, it is a specialized Lazy/non-strict Traversable.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import scalax.io._
val input:Input = Resource.fromURL("http://www.scala-lang.org")
 
// The simplest way to read data is to get the bytes from an Input object
val bytes: LongTraversable[Byte] = input.bytes
 
// you can also get the characters and strings from an Input object but you need a codec for decoding the bytes
val chars: LongTraversable[Char] = input.chars(Codec.UTF8)
 
// The default encoding is UTF-8 so one can leave of the codec if desired
val defaultChars = input.chars
 
// Notice that the () are left off input.chars.  That is because the codec
// parameter is implicit.  This allows the codec to be defined once
// and be used by all chars calls
implicit val codec = Codec.ISO8859
val iso8859chars = input.chars
 
// The read method in the java InputStream API returns the bytes
// as an integer. For the similar behaviour you can call bytesAsInts
val bytesAsInts = input.bytesAsInts
 
// There are two useful methods for loading all data into memory
val string = input.string
val array = input.byteArray
 
// We will revisit it in more detail later but there is an efficient copy method
// for copying the data from the Input object to any Output
// The copy method detects type of the Input Object and the Output
// object and intelligently chooses a method for copying that is
// as efficient as possible.
// For example if the Input is based on a FileInputStream and
// Output is based on a FileOutputStream the FileChannel copyTo
// method is used so that the OS will perform the copy
// efficiently
input copyDataTo Resource.fromFile("file")
 
// Lines Example 1
// Another useful method is the lines() method which more or less
// does what it implies.  The default behaviour will be to Auto
// detect the line ending.  For efficiency it finds the end of the first
// line and then assumes the rest of the file has the same ending. 
// The terminator is removed by default.
input.lines()
 
// Lines Example 2
// This example explicitly declares the EOL terminator and
// requests that the terminator is not stripped from the line
import Line.Terminators._
input.lines(terminator=NewLine, includeTerminator=true)
 
// Lines Example 3
// Lastly lines is not limited to just the standard line endings
// arbitrary separators can be used.  This last example
// demonstrates parsing the input into lines using %% as the
// EOL terminator.  Additionally it shows the explicit use
// of the Codec to override the in scope implicit Codec
input.lines(terminator=Custom("%%"))(Codec.UTF8)

Friday, July 27, 2012

Scala-IO Getting Started

For the next several posts you will need to have Scala-IO installed and probably should have a sbt project as well.

There are currently 2 Scala-IO 0.4 releases.

  • Scala-io 0.4-seq - A version of Scala 0.4 without the Akka dependency and therefore no ASync support
  • Scala-io 0.4 - The full version that contains an Akka  dependency
The Scala 2.10 versions will have no Akka dependency but can optionally use Akka.

So getting started:

Download the example project on the docs website (http://jesseeichar.github.com/scala-io-doc/latest):
  • Go to Getting Started and follow instructions for downloading and running the example project.  The following goes through the steps for the 0.4.1 instructions.


1
2
3
4
5
6
7
8
9
10
11
12
13
sbt console
 
scala> import scalax.io._
import scalax.io._
 
scala> import java.net.URL
import java.net.URL
 
scala> Resource.fromURL(new URL("http://www.scala-lang.com")).
     | lines().async.size.onComplete(println)
res3: akka.dispatch.Future[Int] = akka.dispatch.DefaultPromise@363adfb4
 
scala> Right(770)

The last line (Right(770)) is not a command to enter; it is the result of the asynchonous call.

Thursday, July 26, 2012

Introducing Scala-IO


This is the start of a series of posts on Scala-IO.  Scala-IO is as the name implies a library for performing input and output operations with Scala.  There are 4 main facets to the library


  • Basic IO - Reading and writing to some underlying resource.  The current implementation is Java based and thus allows reading and writing to resources like java.io.Readers, Writers, Channels, Streams, etc...
  • File API - A library loosely designed after java 7 nio.file API with an additional simple unix like DSL for traversing and searching the filesystem.  It is a pluggable architecture which allows plugins for systems like WebDav or Zip filesystems to be addressed in a similar manner as the local filesystem.  The included implementation is for the local file system and is implemented on the java.io APIs
  • Asynchronous Access - Throughout the APIs are both synchronous and asynchronous options allowing both models of programming to be easily used.  
    • In the 2.10.x + versions the future implementations are pluggable but require no additional libraries if so that is the desire
    • In 2.9.x versions there are two different dependencies one with asynchronous APIs implemented on Akka and one without any asynchronous APIs
  • Processor API - An API for defining complex IO processes declaratively.
This series will look at normally a small and simple IO operation each day (or so) rather than only a few in-depth articles.  This is required because of my limited available time.

With the introduction done lets look at two small examples:

Read File with Core API (not File API):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import scalax.io._
 
// Create a resource from a file.  This is a convenience method and the string can be
// a relative path or a absolute path.  While this is convenient it is not portable
// so it is recommened to use the File API to create a Path object and read from that
val input:Input = Resource.fromFile("someFile")
 
// read all bytes into an in memory array
input.byteArray
 
// skip first 5 bytes and take the next 5
// force the operation to take place.
// The bytes is a ResourceView which is a LongTraversableView,
// meaning it will evaluate lazily until the data is forced
// or requested
input.bytes.drop(5).take(5).force
 
// read all bytes into a string
// note: codec can be passed implicitely as well
input.string(Codec.UTF8)
Same thing but with File API:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import scalax.file.Path
 
val path = Path("file")
 
// read all bytes into an in memory array
path.byteArray
 
// skip first 5 bytes and take the next 5
// force the operation to take place.
// The bytes is a ResourceView which is a LongTraversableView,
// meaning it will evaluate lazily until the data is forced
// or requested
path.bytes.drop(5).take(5).force
 
// read all bytes into a string
// note: codec can be passed implicitly as well
path.string(Codec.UTF8)