Friday, October 5, 2012

Scala-IO Core: Unmanaged Resources

The main design of Scala-IO is around automatic closing of resources each time a resource is accessed in order to ensure that a programmer cannot unintentionally leave resources open in the face of exceptions or other unexpected situations. However, there are cases where the Scala-IO API is desired but the resource management is undesired. The classic case is of reading or writing to System.in and out. Thus Unmanaged resources exist to satisfy this use-case. 

Since unmanaged resources is a less common use-case there is not a factory object like there is for normal managed Resources.  Instead certain objects can be converted to unmanaged resources using the JavaConverters implicit methods as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
// JavaConverters implicit methods are required to create the
// unmanaged resources
scala> import scalax.io.JavaConverters._
import scalax.io.JavaConverters._
 
// now that JavaConverters are in scope we can convert
// System.out to a Output object and use it as any normal
// output object except that the stream will not be closed
// WriteableByteChannels can also be converted
scala> System.out.asUnmanagedOutput.write("Not closing right?")
Not closing right?
 
// See, still not closed
scala> System.out.asUnmanagedOutput.write("Still not closed?")
Still not closed?
 
scala> import java.io._
import java.io._
 
// another demonstration of converting an output stream to a
// unmanaged Output object.  This is frowned upon unless
// unavoidable.
scala> val fout = new FileOutputStream("somefile.txt")
fout: java.io.FileOutputStream = java.io.FileOutputStream@23987721
 
scala> val foutOutput = fout.asUnmanagedOutput
foutOutput: scalax.io.Output = scalax.io.unmanaged.WritableByteChannelResource@36b54a77
 
scala> foutOutput.write("Hello ")
 
scala> foutOutput.write("World!")
 
scala> fout.close
 
// see the output object is broken now because the stream is closed
scala> foutOutput.write("boom")
No Main Exception
---
class java.nio.channels.ClosedChannelException(null)
...
 
// The mirror image is converting an input stream to an Input object
scala> val fin = new FileInputStream("somefile.txt")
fin: java.io.FileInputStream = java.io.FileInputStream@4fcbc4de
 
scala> val chars = fin.asUnmanagedInput.chars
chars: scalax.io.LongTraversable[Char] = LongTraversable(...)
 
// normally a LongTraversable will close the resource
// but this LongTraversable is obtained from a unmanagedInput
// so can be used multiple times without closing the resource
scala> chars.head
res19: Char = H
 
scala> chars.head
res20: Char = e
 
scala> chars.head
res21: Char = l
 
scala> chars.head
res22: Char = l
 
// don't forget to close
scala> fin.close
 
// quick demo of using channels
// the following is a major anti-pattern and is
// here mainly for completeness
scala> val fchannel = new RandomAccessFile("somefile.txt", "rw").getChannel
fchannel: java.nio.channels.FileChannel = sun.nio.ch.FileChannelImpl@1e10cb60
 
scala> val fInput2 = fchannel.asUnmanagedInput
fInput2: scalax.io.Input = scalax.io.unmanaged.ReadableByteChannelResource@679a339e
 
scala> println(fInput2.string)
Hello World!
 
scala> fchannel.isOpen
res13: Boolean = true
 
scala> val fOutput = fchannel.asUnmanagedOutput
fOutput: scalax.io.Output = scalax.io.unmanaged.WritableByteChannelResource@9cc8b91
 
scala> fOutput.write("hi there")
 
scala> println(fInput2.string)
hi thererld!
 
// don't forget to close
scala> fchannel.close

Wednesday, September 26, 2012

Scala-IO Core: To Resource Converters

In order to simplify integration with existing libraries, most commonly Java libraries, Scala-IO provides a JavaConverters object with implicit methods that add as*** methods (asInput, asOutput, asSeekable, etc...) to several types of objects.  It is the same pattern as in the scala.collection.JavaConverters object.

These methods can be used instead of the Resource.from*** methods to provide a slightly nicer appearing code.

There is one warning. When using JavaConverters, instead of Resource.from*** for creating Input/Output/Seekable/etc... objects, the chances of falling into the trap of creating non-reusable resources or causing a resource leak is increased. See: scala-io-core-reusable-resources for more details on this.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// JavaConverters object contains the implicit methods
// which add the as*** methods to the applicable _normal_ objects
import scalax.io.JavaConverters._
 
// Several objects can be converted to input objects using
// asInput.  URLs, File, RandomAccessFile, InputStream, Traversable[Byte]
// Array[Byte], ReadableByteChannel
val input = new URL("http://www.camptocamp.com").asInput
 
// simple demonstation using the newly created input object
println(input.bytes.size)
 
// asSeekable can only be applied to a few objects at the
// moment.  Including RandomAccessFile, SeekableByteChannel,
// File and perhaps in future mutable Sequences
val file = new java.io.File("somefile.txt").asSeekable
 
// demonstrate a seekable method.  This method
// ensures the file is empty
file.truncate(0)
 
// write hi using the Output  API
file.write("hi :)")
 
// output the file to the console to see the results
println(file.string)
 
// asUnmanaged*** created Unmanaged resources for
// operations.  The Unmanaged Resource post will
// discuss this in more detail but essentially the
// resource is not closed and thus is useful when
// dealing with underlying objects that should not
// be closed like System.out
val unmanagedOutput = System.out.asUnmanagedOutput
 
// prints to standard out: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
unmanagedOutput.writeStrings(1 to 10 map (_.toString), ", ")
 
// prints: Hello World
// This demonstrates how a simple Array or Traversable can be easily used
// as an Input object.  The array is Hello world encoded as normal latin1
println(Array(72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100).asInput.string)
 
// prints: Hll Wrld
// Raw Strings and Reader objects cannot currently be converted to an Input object
// but can be converted to ReadChars object (Essentially the character API of Input)
println("Hello World".asReadChars.chars.filterNot(c => "aeiou" contains c) mkString)
 
// Similarly Writer objects cannot be Output objects since they can't write
// bytes so asWriteChars is used to be able to have the string writing
// capabilities of Scala-IO
new java.io.FileWriter("somefile.txt").asWriteChars write "Hi"
 
// prints: Hi
println(new java.io.FileReader("somefile.txt").asReadChars.string)

Wednesday, September 19, 2012

Scala-IO Core: Reusable Resources


One aspect of resources in Scala-IO that can cause problems is the construction of resource objects.  The factory methods that are provided in the Resource object each have a lazy parameter for opening the underlying resource.  However a common error developers can make is to pass in the already open resource to the method which has multiple problems.

Consider:
1
2
3
4
5
6
7
8
9
import scalax.io._
   
val stream = new java.io.FileOutputStream("somefile.txt")
val output = Resource.fromOutputStream(stream) 
output write "hey "
 
// boom, stream was closed last write
// and the stream cannot be reopened.
output write "how's it going?"
In the example above the stream is created and opened at the definition of stream (it is a val).  This has two effects:

  1. the stream is open and if the resource object is not closed you will have a resource leak
  2. since the stream is opened the resource can only be used once since it will be closed after each use.
The correct way to create the resource would be to change val to def so that the stream is only created on demand and therefore there will be no chance of a resource leak.  The following is the correct example:
1
2
3
4
5
6
7
8
9
10
11
12
import scalax.io._
 
def stream = new java.io.FileOutputStream("somefile.txt")
val output = Resource.fromOutputStream(stream)
 
output write "hey "
// the second write will now work.  However since
// the underlying resource is a FileOutputStream
// the file will contain just "how's it going"
output write "how's it going?"
 
println(Resource.fromFile("somefile.txt").string)

This anti-pattern is also a risk when using the converter methods in the JavaConverters object. (A future post will look into this in more detail.) The following example shows the anti-pattern in effect:
1
2
3
4
5
6
7
8
9
import scalax.io._
import JavaConverters._
 
val output = new java.io.FileOutputStream("somefile.txt").asOutput
 
output write "hey "
// Next line will cause exception because
// stream is closed.
output write "how's it going?"
The asOutput method can only be applied to an object (at time of this writing) and therefore the resulting object has all of the negative characteristics mentioned above. Therefore it is recommended that asOutput/asInput/etc... only be used on 1 time use resources (like InputStream) within a scope and not passed out to an external method so that it is easy to view the entirety of the operation.

Thursday, September 13, 2012

Scala-IO Core: ReadChars and WriteChars

The Input and Output objects of Scala-IO assume that the underlying data is composed of bytes.  However, another common pattern is to have the underlying data be composed of characters instead of bytes, for example java.io.Reader and java.io.Writer.  While it is possible to decompose the output into Bytes and construct an Input object from the decorated object, ReadChars and WriteChars can be used in this situation to reduce the work needed to interact with such resources.

ReadChars and WriteChars are traits that contain the character and string methods of Input and Output.  The primary difference is that the Charset is defined by the underlying resource rather than supplied at the method invocation site.  

Compare two methods:

Input:
1
def chars(implicit codec: Codec = Codec.default): LongTraversable[Char]
ReadChars:
1
def chars: LongTraversable[Char]
You will notice that the ReadChars method does not have the codec parameter because there translation is not required, unlike in Input which requires the characters to be created from raw bytes.

Not many examples are needed to explain these concepts but here are a few examples on how to create ReadChar and WriteChar objects:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import scalax.io._
import JavaConverters._
 
// JavaConverters has asReadChars and asWriteChars
// for converting some objects to ReadChars and WriteChars
// (The JavaConverters post will explain
// more about the JavaConverters object)
"Hello World".asReadChars.chars.size
 
val writer = new java.io.StringWriter()
// Resource object can be used to create a ReadChars and WriteChars
val writeChars = Resource.fromWriter(writer).write("Yeee Hawww!")
println(Resource.fromReader(new java.io.StringReader(writer.toString)).lines().size)
 
// ReadChars and WriteChars can be obtained from
// InputResource and OutputResource object respectively
// SeekableByteChannelResource[SeekableByteChannel]
// (returned by fromFile) implements both traits and
// therefore has both methods
val fileResource = Resource.fromFile("somefile.txt")
 
implicit val codec = Codec.UTF8
// clear any old data in file
fileResource.truncate(0)
 
// Codec used is declared when calling writer or reader
// methods
fileResource.writer.write("hi")
 
println(fileResource.reader.chars.size)

Thursday, August 30, 2012

On Vacation

I am getting a lot of emails about Scala-IO and my posts.  Just want to let everyone know I am on vacation until September 10th or so.  I have some posts in the works but they won't be done here where I have virtually no internet.

Back soon.

Sunday, August 19, 2012

Scala-IO Core: Seekable

At the same level of abstraction as Input and Output is the fine trait called Seekable.  As the name implies it provides random access style methods for interacting with a resource.  The example that comes immediately to mind is a random access file.

The design of Seekable largely mimics the scala.collection.Seq patch and insert methods.  Not much more to say beyond getting into some examples:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
import scalax.io._
import java.io.File
 
// For this example lets explicitly specify a codec
implicit val codec = scalax.io.Codec.UTF8
 
val file: Seekable =  Resource.fromFile(new File("scala-io.out"))
 
// delete all data from the file
// or more specifically only keep the data from the beginning of the
// file up to (but not including) the first byte
file truncate 0
 
val seedData = "Entering some seed data into the file"
file write seedData
 
// verify seed data was correctly written (in REPL)
file.string
 
 
// write "first" after character 9
// if the file is < 9 characters an underflow exception is thrown
// if the patch extends past the end of the file then the file is extended
// Note: The offset is always dependent on type of data being written
// for example if the data written is a string it will be 9 characters
// if the data is bytes it will be 9 bytes
file patch (9, "first",OverwriteAll)
 
// dumping to REPL will show: Entering firstseed data into the file
file.string
 
// patch at position 0 and replace 100 bytes with new data
file.patch(0,seedData, OverwriteSome(100))
 
// dumping to REPL will show the unchanged seedData once again
file.string
 
// Overwrite only 4 bytes starting at bytes 9.
// the extra bytes will be inserted
// In other words the "some" word of seed data will
// be replaced with second
// Warning: This is an overwrite and an insert
// inserts are expensive since it requires copying the data from
// the index to end of file.  If small enough it is done in
// memory but a temporary file is required for big files.
file.patch(9,"second".getBytes(), OverwriteSome(4))
 
// dumping to REPL will show: Entering second seed data into the file
file.string
 
// reset file
file.patch(0,seedData, OverwriteSome(100))
 
// Replace 9 bytes with the 5 bytes that are provided
// In other words: replace "some seed" with "third"
file.patch(9,"third".getBytes(), OverwriteSome(9))
 
// dumping to REPL will show: Entering third data into the file
file.string
 
// reset file
file.patch(0,seedData, OverwriteSome(100))
 
// Insert a string at start of file
file.insert(0, "newInsertedData ")
 
// dumping to REPL will show: newInsertedData Entering some seed data into the file
file.string
 
// reset file
file.patch(0,seedData, OverwriteSome(100))
 
//add !! to end of file
file.append("!!")
 
// dumping to REPL will show: Entering some seed data into the file!!
file.string
IMPORTANT: Each time truncate() or patch or insert is called a new connection to the file is opened and closed. The Processor API is to be used to perform multiple operations within one connection.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import scalax.io._
import java.io.File
 
val file: Seekable =  Resource.fromFile(new File("scala-io.out"))
 
for {
  p <- file.seekableProcessor
  seekable = p.asSeekable
} {
  seekable truncate 0
  seekable write "hi"
  seekable append " world"
 
  // one can do patch, insert etc...
  // or move the cursor to the correct position and write
  // this is essentially a patch(0, "Hi", OverwriteAll)
  seekable.position = 0
  seekable write "Hi"
 
  seekable patch (3, "W", OverwriteAll)
 
  // dumping to REPL will show: Hi World
  println (seekable.string)
}

Tuesday, August 14, 2012

Scala-IO Core: Output - OutputConverter

As mentioned in the last post on Output, it is possible to write arbitrary objects to an output object and have it serialized to disk.

The way this is handled in Scala-IO is via OutputConverters.  If you are familiar with the type-class pattern then this should be very clear to you how this works.  For a very quick introduction you can read: http://www.sidewayscoding.com/2011/01/introduction-to-type-classes-in-scala.html.

The clue is in the signature of write:
1
def write[T](data: T)(implicit writer: OutputConverter[T]): Unit

the last parameter is the object that defines how the object is serialized.  The OutputConverter trait essentially converts and object into bytes and has a few built-in implementations in its companion object for objects like Int, Float, Byte, Char, etc... 

Since the parameter is implicit the compiler will search for an implementation that satisfies the requirements (that the OutputConverter has the type parameter T).  This allows:
1
2
3
4
5
6
7
8
9
10
11
12
import scalax.io._
 
val output:Output = Resource.fromFile("scala-io.out")
 
output write 3
 
// and
 
output write Seq(1,2,3)
 
// one can be more explicit and declare the OutputConverter
output.write(3)(OutputConverter.IntConverter)
The last line in the example shows the explicit declaration of the OutputConverter to use when writing the data. This indicates how one can provide their own converter.

Since the parameter is implicit there are two ways that custom OutputConverters can be used.
  • defining an implicit object for the object to be written. In this case all the possible ways implicits can be defined can be used. For example as an implicit value or in the companion object of the object to be written (serialized)
  • Explicitly declare the converter to use at the method call site

First let's examine the use-case where the object is from a different library and therefore we cannot create a companion object for the object.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import java.util.Date
import scalax.io._
import Resource._
import OutputConverter._
val file = fromFile("scala-io.out")
 
// Simplest design pattern is to create a new implicit object in scope
implicit object DateConverter extends OutputConverter[Date] {
  def sizeInBytes = 8
  def toBytes(data: Date) = LongConverter.toBytes(data.getTime)
}
 
file.write(java.util.Calendar.getInstance().getTime())
 
// display result in REPL
file.byteArray
 
// naturally the write method can have the converter
// explicitly declared if you don't want to make the
// object implicit
file.write(java.util.Calendar.getInstance().getTime())(DateConverter)
The second case is where you are implementing the class and therefore can add a companion object:
For this next bit to work you need to paste it into a file and run that or use the paste mechanism of the REPL (type :paste into repl and press enter)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import scalax.io._
import Resource._
import OutputConverter._
 
class MyData(val name:String)
object MyData {
  implicit object Converter extends OutputConverter[MyData] {
    def sizeInBytes = 1
    def toBytes(data: MyData) = data.name.getBytes("UTF8")
  }
}
val file = fromFile("scala-io.out")
 
// lets quickly delete file to make sure we are dealing with
// and empty file (this is a method on Seekable)
file.truncate(0)
 
file write (new MyData("jesse"))
 
// display result in REPL
file.string