Thursday, August 30, 2012

On Vacation

I am getting a lot of emails about Scala-IO and my posts. Just want to let everyone know I am on vacation until September 10th or so. I have some posts in the works but they won't be done here where I have virtually no internet.

Back soon.

Sunday, August 19, 2012

Scala-IO Core: Seekable

At the same level of abstraction as Input and Output is the fine trait called Seekable. As the name implies it provides random access style methods for interacting with a resource. The example that comes immediately to mind is a random access file.

The design of Seekable largely mimics the scala.collection.Seq patch and insert methods. Not much more to say beyond getting into some examples:
IMPORTANT: Each time truncate() or patch or insert is called a new connection to the file is opened and closed. The Processor API is to be used to perform multiple operations within one connection.

Tuesday, August 14, 2012

Scala-IO Core: Output - OutputConverter

As mentioned in the last post on Output, it is possible to write arbitrary objects to an output object and have it serialized to disk.

The way this is handled in Scala-IO is via OutputConverters. If you are familiar with the type-class pattern then this should be very clear to you how this works. For a very quick introduction you can read: http://www.sidewayscoding.com/2011/01/introduction-to-type-classes-in-scala.html.

The clue is in the signature of write:

def write[T](data: T)(implicit writer: OutputConverter[T]): Unit

the last parameter is the object that defines how the object is serialized. The OutputConverter trait essentially converts and object into bytes and has a few built-in implementations in its companion object for objects like Int, Float, Byte, Char, etc...

Since the parameter is implicit the compiler will search for an implementation that satisfies the requirements (that the OutputConverter has the type parameter T). This allows:

import scalax.io._

val output:Output = Resource.fromFile("scala-io.out")

output write 3

// and

output write Seq(1,2,3)

// one can be more explicit and declare the OutputConverter
output.write(3)(OutputConverter.IntConverter)

The last line in the example shows the explicit declaration of the OutputConverter to use when writing the data. This indicates how one can provide their own converter.

Since the parameter is implicit there are two ways that custom OutputConverters can be used.

defining an implicit object for the object to be written. In this case all the possible ways implicits can be defined can be used. For example as an implicit value or in the companion object of the object to be written (serialized)
Explicitly declare the converter to use at the method call site

First let's examine the use-case where the object is from a different library and therefore we cannot create a companion object for the object. The second case is where you are implementing the class and therefore can add a companion object:
For this next bit to work you need to paste it into a file and run that or use the paste mechanism of the REPL (type :paste into repl and press enter)

Wednesday, August 8, 2012

Scala-IO Core: Output

The Output object is the primary trait for writing data to a resource. The basic usage is very simple but can get more complex when one wishes to serialize objects.

Lets start with the basic usage: A common need is to write several times to a single Output without overwriting the data. To do this one can use the processing API. A future post(s) will look at the processing API in more detail but for now a simple example:

Monday, August 6, 2012

Scala-IO Core: Long Traversable

The LongTraversable trait is one of the most important objects in Scala IO. Input provides a uniform way of creating views on the data (as a string or byte array or LongTraversable of something like bytes.)

LongTraversable is a scala.collection.Traversable with some extra capabilities. A few of the salient points of LongTraversable are:

It is a lazy/non-strict collection similar to Stream. In other words, you can perform operations like map, flatmap, filter, collect, etc... without accessing the resource
Methods like slice and drop will (if possible for the resource) skip the dropped bytes without reading them
Each usage of the LongTraversable will typically open and close the underlying resource.
Has methods that one typically finds in Seq. For example: zip, apply, containsSlice
Has methods that take or return Longs instead of Ints like ldrop, lslice, ltake, lsize
Has limitFold method that allows fold like behaviour with extra features like skip and early termination
Can be converted to an AsyncLongTraversable which has methods that return Futures instead and won't block the program
Can be converted to a Process object for advanced data processing pipelines

Example usage:

The limitFold method can be quite useful to process only a portion of the file if you don't know ahead of time what the indices of the portion are:

Thursday, August 2, 2012

Scala-IO Core: Resource, Input

Just a note: all these examples have been tested in REPL so go ahead and fire up the sbt console in the example project and try these out.

Resource

Resource is the fundamental component of Scala-IO. A Resource is essentially anything that has a simple open/close lifecycle. The Resource trait handles the lifecycle for the developer allowing him to focus on the IO logic.

In the typical use-case one of the Resource subclasses will be used. They are more useful in general because they will have one of higher level traits mixed in like Input or Output.

The most typical way to create a Resource is with the Resource object which is a factory method for creating Resource objects from various types of Java objects.

While Resource is the foundation Trait, Input and Output are the Traits most commonly used, The user-facing traits if you will.

Here are a few examples of creating Resources: There are advanced usages of Resource that we will get into in later posts. At the moment I want to focus on Input, Output and Seekable Traits. In later posts we will look at how to integrate with legacy Java APIs and how to access the underlying resource using the loan pattern.

Input

The Input Trait provides methods for accessing the data of the underlying resource in various different way. As bytes, strings, lines, etc...

There are two basic types of methods. Methods that return LongTraversable objects and methods that load the entire Resource into memory. For example: string and byteArray load the entire resource into memory while bytes and chars return a LongTraversable.

What is a LongTraversable? That will be the next post :-). Summarized, it is a specialized Lazy/non-strict Traversable.