Friday, October 5, 2012

Scala-IO Core: Unmanaged Resources

The main design of Scala-IO is around automatic closing of resources each time a resource is accessed in order to ensure that a programmer cannot unintentionally leave resources open in the face of exceptions or other unexpected situations. However, there are cases where the Scala-IO API is desired but the resource management is undesired. The classic case is of reading or writing to System.in and out. Thus Unmanaged resources exist to satisfy this use-case. 

Since unmanaged resources is a less common use-case there is not a factory object like there is for normal managed Resources.  Instead certain objects can be converted to unmanaged resources using the JavaConverters implicit methods as follows:

Wednesday, September 26, 2012

Scala-IO Core: To Resource Converters

In order to simplify integration with existing libraries, most commonly Java libraries, Scala-IO provides a JavaConverters object with implicit methods that add as*** methods (asInput, asOutput, asSeekable, etc...) to several types of objects.  It is the same pattern as in the scala.collection.JavaConverters object.

These methods can be used instead of the Resource.from*** methods to provide a slightly nicer appearing code.

There is one warning. When using JavaConverters, instead of Resource.from*** for creating Input/Output/Seekable/etc... objects, the chances of falling into the trap of creating non-reusable resources or causing a resource leak is increased. See: scala-io-core-reusable-resources for more details on this.

Wednesday, September 19, 2012

Scala-IO Core: Reusable Resources


One aspect of resources in Scala-IO that can cause problems is the construction of resource objects.  The factory methods that are provided in the Resource object each have a lazy parameter for opening the underlying resource.  However a common error developers can make is to pass in the already open resource to the method which has multiple problems.

Consider:
In the example above the stream is created and opened at the definition of stream (it is a val).  This has two effects:

  1. the stream is open and if the resource object is not closed you will have a resource leak
  2. since the stream is opened the resource can only be used once since it will be closed after each use.
The correct way to create the resource would be to change val to def so that the stream is only created on demand and therefore there will be no chance of a resource leak.  The following is the correct example:

This anti-pattern is also a risk when using the converter methods in the JavaConverters object. (A future post will look into this in more detail.) The following example shows the anti-pattern in effect: The asOutput method can only be applied to an object (at time of this writing) and therefore the resulting object has all of the negative characteristics mentioned above. Therefore it is recommended that asOutput/asInput/etc... only be used on 1 time use resources (like InputStream) within a scope and not passed out to an external method so that it is easy to view the entirety of the operation.

Thursday, September 13, 2012

Scala-IO Core: ReadChars and WriteChars

The Input and Output objects of Scala-IO assume that the underlying data is composed of bytes.  However, another common pattern is to have the underlying data be composed of characters instead of bytes, for example java.io.Reader and java.io.Writer.  While it is possible to decompose the output into Bytes and construct an Input object from the decorated object, ReadChars and WriteChars can be used in this situation to reduce the work needed to interact with such resources.

ReadChars and WriteChars are traits that contain the character and string methods of Input and Output.  The primary difference is that the Charset is defined by the underlying resource rather than supplied at the method invocation site.  

Compare two methods:

Input:
def chars(implicit codec: Codec = Codec.default): LongTraversable[Char]
ReadChars:
def chars: LongTraversable[Char]
You will notice that the ReadChars method does not have the codec parameter because there translation is not required, unlike in Input which requires the characters to be created from raw bytes.

Not many examples are needed to explain these concepts but here are a few examples on how to create ReadChar and WriteChar objects:

Thursday, August 30, 2012

On Vacation

I am getting a lot of emails about Scala-IO and my posts.  Just want to let everyone know I am on vacation until September 10th or so.  I have some posts in the works but they won't be done here where I have virtually no internet.

Back soon.

Sunday, August 19, 2012

Scala-IO Core: Seekable

At the same level of abstraction as Input and Output is the fine trait called Seekable.  As the name implies it provides random access style methods for interacting with a resource.  The example that comes immediately to mind is a random access file.

The design of Seekable largely mimics the scala.collection.Seq patch and insert methods.  Not much more to say beyond getting into some examples:
IMPORTANT: Each time truncate() or patch or insert is called a new connection to the file is opened and closed. The Processor API is to be used to perform multiple operations within one connection.

Tuesday, August 14, 2012

Scala-IO Core: Output - OutputConverter

As mentioned in the last post on Output, it is possible to write arbitrary objects to an output object and have it serialized to disk.

The way this is handled in Scala-IO is via OutputConverters.  If you are familiar with the type-class pattern then this should be very clear to you how this works.  For a very quick introduction you can read: http://www.sidewayscoding.com/2011/01/introduction-to-type-classes-in-scala.html.

The clue is in the signature of write:
def write[T](data: T)(implicit writer: OutputConverter[T]): Unit

the last parameter is the object that defines how the object is serialized.  The OutputConverter trait essentially converts and object into bytes and has a few built-in implementations in its companion object for objects like Int, Float, Byte, Char, etc... 

Since the parameter is implicit the compiler will search for an implementation that satisfies the requirements (that the OutputConverter has the type parameter T).  This allows:
import scalax.io._

val output:Output = Resource.fromFile("scala-io.out")

output write 3

// and

output write Seq(1,2,3)

// one can be more explicit and declare the OutputConverter
output.write(3)(OutputConverter.IntConverter)
The last line in the example shows the explicit declaration of the OutputConverter to use when writing the data. This indicates how one can provide their own converter.

Since the parameter is implicit there are two ways that custom OutputConverters can be used.
  • defining an implicit object for the object to be written. In this case all the possible ways implicits can be defined can be used. For example as an implicit value or in the companion object of the object to be written (serialized)
  • Explicitly declare the converter to use at the method call site

First let's examine the use-case where the object is from a different library and therefore we cannot create a companion object for the object. The second case is where you are implementing the class and therefore can add a companion object:
For this next bit to work you need to paste it into a file and run that or use the paste mechanism of the REPL (type :paste into repl and press enter)

Wednesday, August 8, 2012

Scala-IO Core: Output

The Output object is the primary trait for writing data to a resource. The basic usage is very simple but can get more complex when one wishes to serialize objects.

Lets start with the basic usage: A common need is to write several times to a single Output without overwriting the data. To do this one can use the processing API. A future post(s) will look at the processing API in more detail but for now a simple example:

Monday, August 6, 2012

Scala-IO Core: Long Traversable

The LongTraversable trait is one of the most important objects in Scala IO. Input provides a uniform way of creating views on the data (as a string or byte array or LongTraversable of something like bytes.)

LongTraversable is a scala.collection.Traversable with some extra capabilities. A few of the salient points of LongTraversable are:
  • It is a lazy/non-strict collection similar to Stream. In other words, you can perform operations like map, flatmap, filter, collect, etc... without accessing the resource
  • Methods like slice and drop will (if possible for the resource) skip the dropped bytes without reading them
  • Each usage of the LongTraversable will typically open and close the underlying resource.
  • Has methods that one typically finds in Seq.  For example: zip, apply, containsSlice
  • Has methods that take or return Longs instead of Ints like ldrop, lslice, ltake, lsize
  • Has limitFold method that allows fold like behaviour with extra features like skip and early termination
  • Can be converted to an AsyncLongTraversable which has methods that return Futures instead and won't block the program
  • Can be converted to a Process object for advanced data processing pipelines
Example usage:

The limitFold method can be quite useful to process only a portion of the file if you don't know ahead of time what the indices of the portion are:

Thursday, August 2, 2012

Scala-IO Core: Resource, Input

Just a note: all these examples have been tested in REPL so go ahead and fire up the sbt console in the example project and try these out.

Resource


Resource is the fundamental component of Scala-IO. A Resource is essentially anything that has a simple open/close lifecycle. The Resource trait handles the lifecycle for the developer allowing him to focus on the IO logic.

In the typical use-case one of the Resource subclasses will be used. They are more useful in general because they will have one of higher level traits mixed in like Input or Output.

The most typical way to create a Resource is with the Resource object which is a factory method for creating Resource objects from various types of Java objects.

While Resource is the foundation Trait, Input and Output are the Traits most commonly used, The user-facing traits if you will.

Here are a few examples of creating Resources: There are advanced usages of Resource that we will get into in later posts. At the moment I want to focus on Input, Output and Seekable Traits. In later posts we will look at how to integrate with legacy Java APIs and how to access the underlying resource using the loan pattern.

Input


The Input Trait provides methods for accessing the data of the underlying resource in various different way. As bytes, strings, lines, etc...

There are two basic types of methods. Methods that return LongTraversable objects and methods that load the entire Resource into memory. For example: string and byteArray load the entire resource into memory while bytes and chars return a LongTraversable.

What is a LongTraversable? That will be the next post :-). Summarized, it is a specialized Lazy/non-strict Traversable.

Friday, July 27, 2012

Scala-IO Getting Started

For the next several posts you will need to have Scala-IO installed and probably should have a sbt project as well.

There are currently 2 Scala-IO 0.4 releases.

  • Scala-io 0.4-seq - A version of Scala 0.4 without the Akka dependency and therefore no ASync support
  • Scala-io 0.4 - The full version that contains an Akka  dependency
The Scala 2.10 versions will have no Akka dependency but can optionally use Akka.

So getting started:

Download the example project on the docs website (http://jesseeichar.github.com/scala-io-doc/latest):
  • Go to Getting Started and follow instructions for downloading and running the example project.  The following goes through the steps for the 0.4.1 instructions.



The last line (Right(770)) is not a command to enter; it is the result of the asynchonous call.

Thursday, July 26, 2012

Introducing Scala-IO


This is the start of a series of posts on Scala-IO.  Scala-IO is as the name implies a library for performing input and output operations with Scala.  There are 4 main facets to the library


  • Basic IO - Reading and writing to some underlying resource.  The current implementation is Java based and thus allows reading and writing to resources like java.io.Readers, Writers, Channels, Streams, etc...
  • File API - A library loosely designed after java 7 nio.file API with an additional simple unix like DSL for traversing and searching the filesystem.  It is a pluggable architecture which allows plugins for systems like WebDav or Zip filesystems to be addressed in a similar manner as the local filesystem.  The included implementation is for the local file system and is implemented on the java.io APIs
  • Asynchronous Access - Throughout the APIs are both synchronous and asynchronous options allowing both models of programming to be easily used.  
    • In the 2.10.x + versions the future implementations are pluggable but require no additional libraries if so that is the desire
    • In 2.9.x versions there are two different dependencies one with asynchronous APIs implemented on Akka and one without any asynchronous APIs
  • Processor API - An API for defining complex IO processes declaratively.
This series will look at normally a small and simple IO operation each day (or so) rather than only a few in-depth articles.  This is required because of my limited available time.

With the introduction done lets look at two small examples:

Read File with Core API (not File API):
Same thing but with File API: