I am getting a lot of emails about Scala-IO and my posts. Just want to let everyone know I am on vacation until September 10th or so. I have some posts in the works but they won't be done here where I have virtually no internet.
Back soon.
Thursday, August 30, 2012
Sunday, August 19, 2012
Scala-IO Core: Seekable
At the same level of abstraction as Input and Output is the fine trait called Seekable. As the name implies it provides random access style methods for interacting with a resource. The example that comes immediately to mind is a random access file.
The design of Seekable largely mimics the scala.collection.Seq patch and insert methods. Not much more to say beyond getting into some examples:
IMPORTANT: Each time truncate() or patch or insert is called a new connection to the file is opened and closed. The Processor API is to be used to perform multiple operations within one connection.
The design of Seekable largely mimics the scala.collection.Seq patch and insert methods. Not much more to say beyond getting into some examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | import scalax.io. _ import java.io.File // For this example lets explicitly specify a codec implicit val codec = scalax.io.Codec.UTF 8 val file : Seekable = Resource.fromFile( new File( "scala-io.out" )) // delete all data from the file // or more specifically only keep the data from the beginning of the // file up to (but not including) the first byte file truncate 0 val seedData = "Entering some seed data into the file" file write seedData // verify seed data was correctly written (in REPL) file.string // write "first" after character 9 // if the file is < 9 characters an underflow exception is thrown // if the patch extends past the end of the file then the file is extended // Note: The offset is always dependent on type of data being written // for example if the data written is a string it will be 9 characters // if the data is bytes it will be 9 bytes file patch ( 9 , "first" ,OverwriteAll) // dumping to REPL will show: Entering firstseed data into the file file.string // patch at position 0 and replace 100 bytes with new data file.patch( 0 ,seedData, OverwriteSome( 100 )) // dumping to REPL will show the unchanged seedData once again file.string // Overwrite only 4 bytes starting at bytes 9. // the extra bytes will be inserted // In other words the "some" word of seed data will // be replaced with second // Warning: This is an overwrite and an insert // inserts are expensive since it requires copying the data from // the index to end of file. If small enough it is done in // memory but a temporary file is required for big files. file.patch( 9 , "second" .getBytes(), OverwriteSome( 4 )) // dumping to REPL will show: Entering second seed data into the file file.string // reset file file.patch( 0 ,seedData, OverwriteSome( 100 )) // Replace 9 bytes with the 5 bytes that are provided // In other words: replace "some seed" with "third" file.patch( 9 , "third" .getBytes(), OverwriteSome( 9 )) // dumping to REPL will show: Entering third data into the file file.string // reset file file.patch( 0 ,seedData, OverwriteSome( 100 )) // Insert a string at start of file file.insert( 0 , "newInsertedData " ) // dumping to REPL will show: newInsertedData Entering some seed data into the file file.string // reset file file.patch( 0 ,seedData, OverwriteSome( 100 )) //add !! to end of file file.append( "!!" ) // dumping to REPL will show: Entering some seed data into the file!! file.string |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import scalax.io. _ import java.io.File val file : Seekable = Resource.fromFile( new File( "scala-io.out" )) for { p < - file.seekableProcessor seekable = p.asSeekable } { seekable truncate 0 seekable write "hi" seekable append " world" // one can do patch, insert etc... // or move the cursor to the correct position and write // this is essentially a patch(0, "Hi", OverwriteAll) seekable.position = 0 seekable write "Hi" seekable patch ( 3 , "W" , OverwriteAll) // dumping to REPL will show: Hi World println (seekable.string) } |
Labels:
append,
daily-scala,
insert,
patch,
Scala,
scala-io,
scala-io core,
seekable,
truncate
Tuesday, August 14, 2012
Scala-IO Core: Output - OutputConverter
As mentioned in the last post on Output, it is possible to write arbitrary objects to an output object and have it serialized to disk.
The way this is handled in Scala-IO is via OutputConverters. If you are familiar with the type-class pattern then this should be very clear to you how this works. For a very quick introduction you can read: http://www.sidewayscoding.com/2011/01/introduction-to-type-classes-in-scala.html.
The clue is in the signature of write:
The last line in the example shows the explicit declaration of the OutputConverter to use when writing the data. This indicates how one can provide their own converter.
Since the parameter is implicit there are two ways that custom OutputConverters can be used.
First let's examine the use-case where the object is from a different library and therefore we cannot create a companion object for the object.
The second case is where you are implementing the class and therefore can add a companion object:
For this next bit to work you need to paste it into a file and run that or use the paste mechanism of the REPL (type :paste into repl and press enter)
The way this is handled in Scala-IO is via OutputConverters. If you are familiar with the type-class pattern then this should be very clear to you how this works. For a very quick introduction you can read: http://www.sidewayscoding.com/2011/01/introduction-to-type-classes-in-scala.html.
The clue is in the signature of write:
1 | def write[T](data : T)( implicit writer : OutputConverter[T]) : Unit |
the last parameter is the object that defines how the object is serialized. The OutputConverter trait essentially converts and object into bytes and has a few built-in implementations in its companion object for objects like Int, Float, Byte, Char, etc...
Since the parameter is implicit the compiler will search for an implementation that satisfies the requirements (that the OutputConverter has the type parameter T). This allows:
1 2 3 4 5 6 7 8 9 10 11 12 | import scalax.io. _ val output : Output = Resource.fromFile( "scala-io.out" ) output write 3 // and output write Seq( 1 , 2 , 3 ) // one can be more explicit and declare the OutputConverter output.write( 3 )(OutputConverter.IntConverter) |
Since the parameter is implicit there are two ways that custom OutputConverters can be used.
- defining an implicit object for the object to be written. In this case all the possible ways implicits can be defined can be used. For example as an implicit value or in the companion object of the object to be written (serialized)
- Explicitly declare the converter to use at the method call site
First let's examine the use-case where the object is from a different library and therefore we cannot create a companion object for the object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import java.util.Date import scalax.io. _ import Resource. _ import OutputConverter. _ val file = fromFile( "scala-io.out" ) // Simplest design pattern is to create a new implicit object in scope implicit object DateConverter extends OutputConverter[Date] { def sizeInBytes = 8 def toBytes(data : Date) = LongConverter.toBytes(data.getTime) } file.write(java.util.Calendar.getInstance().getTime()) // display result in REPL file.byteArray // naturally the write method can have the converter // explicitly declared if you don't want to make the // object implicit file.write(java.util.Calendar.getInstance().getTime())(DateConverter) |
For this next bit to work you need to paste it into a file and run that or use the paste mechanism of the REPL (type :paste into repl and press enter)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import scalax.io. _ import Resource. _ import OutputConverter. _ class MyData( val name : String) object MyData { implicit object Converter extends OutputConverter[MyData] { def sizeInBytes = 1 def toBytes(data : MyData) = data.name.getBytes( "UTF8" ) } } val file = fromFile( "scala-io.out" ) // lets quickly delete file to make sure we are dealing with // and empty file (this is a method on Seekable) file.truncate( 0 ) file write ( new MyData( "jesse" )) // display result in REPL file.string |
Labels:
daily-scala,
Output,
outputconverter,
Scala,
scala-io,
scala-io core
Wednesday, August 8, 2012
Scala-IO Core: Output
The Output object is the primary trait for writing data to a resource. The basic usage is very simple but can get more complex when one wishes to serialize objects.
Lets start with the basic usage:
A common need is to write several times to a single Output without overwriting the data. To do this one can use the processing API. A future post(s) will look at the processing API in more detail but for now a simple example:
Lets start with the basic usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import scalax.io. _ import scalax.io.Resource val out : Output = Resource.fromOutputStream( new java.io.FileOutputStream( "daily-scala.out" )) val in : Input = Resource.fromFile( "daily-scala.out" ) // Write some bytes to the output object // each write will typically overwrite //the previous data the processing API // can be used is you do not want this behaviour out write "data" .getBytes() out write Array[Byte]( 1 , 2 , 3 ) // print out file in REPL in.byteArray // Writing strings need a Codec // for encoding the strings. The default is UTF8 // but the default is easily overridden. out write "howdy" // printout file in REPL in.string out.write( "howdy" )(Codec.UTF 8 ) // printout file in REPL in.string implicit val defaultCodec : Codec = Codec.UTF 8 // The implicit code will be used instead of the // default codec out write "hi there" // printout file in REPL in.string // write all strings in a collection with default separator ("") out writeStrings Seq( "it" , "was" , "a" , "dark" , "and" , "stormy" , "night" ) // printout file in REPL in.string // write all strings in sequence with a space as the separator out.writeStrings(Seq( "it" , "was" , "a" , "dark" , "and" , "stormy" , "night" ), " " ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | import scalax.io. _ val output : Output = Resource.fromOutputStream( new java.io.FileOutputStream( "daily-scala.out" )) val in : Input = Resource.fromFile( "daily-scala.out" ) // Output processor are used when one // needs to perform batch writes on an // output object When a processor object // is used a "processing" pipeline is // created and the operations are performed // in batch form. // The following example will write 2 lines // to the output object (a file in this case) // there are a few ways to use outputProcessors. // This example is the pattern most developers will // likely be most comfortable with: for { // create a processor (signalling the start of a batch process) processor < - output.outputProcessor // create an output object from it out = processor.asOutput }{ // all writes to out will be on the same open output stream/channel out.write( "first write\n" ) out.write( "second write" ) } // show data in REPL in.string // As will be shown in the future, a processor is typically lazy // if created with map and flatmap calls. // The next example is another way to do the multiple writes. // first create processor val processor = for { // create the processor out < - output.outputProcessor // perform write calls _ < - out.write( "second time first write\n" ) _ < - out.write( "second time second write" ) } yield {} // at this point the writes have not occurred because // processor contains the processing pipeline // show data in REPL in.string processor.execute // execute processor // show data in REPL in.string |
Labels:
daily-scala,
Output,
Scala,
scala-io,
scala-io core
Monday, August 6, 2012
Scala-IO Core: Long Traversable
The LongTraversable trait is one of the most important objects in Scala IO. Input provides a uniform way of creating views on the data (as a string or byte array or LongTraversable of something like bytes.)
LongTraversable is a scala.collection.Traversable with some extra capabilities. A few of the salient points of LongTraversable are:
The limitFold method can be quite useful to process only a portion of the file if you don't know ahead of time what the indices of the portion are:
LongTraversable is a scala.collection.Traversable with some extra capabilities. A few of the salient points of LongTraversable are:
- It is a lazy/non-strict collection similar to Stream. In other words, you can perform operations like map, flatmap, filter, collect, etc... without accessing the resource
- Methods like slice and drop will (if possible for the resource) skip the dropped bytes without reading them
- Each usage of the LongTraversable will typically open and close the underlying resource.
- Has methods that one typically finds in Seq. For example: zip, apply, containsSlice
- Has methods that take or return Longs instead of Ints like ldrop, lslice, ltake, lsize
- Has limitFold method that allows fold like behaviour with extra features like skip and early termination
- Can be converted to an AsyncLongTraversable which has methods that return Futures instead and won't block the program
- Can be converted to a Process object for advanced data processing pipelines
Example usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | import scalax.io. _ import java.net.URL // scala-io versions > 0.4.1 will have method // Resource.fromURLString("http://xyz.com") // but earlier versions use an overloaded method: // Resource.fromURL("http://www.scala-lang.org") // A simple example of comparing all bytes in // one file with those of another. // combining zip with sliding is a good way to perform operations // on sections of two (or more) files val zipped = file 1 .bytes.zip(file 2 .bytes) map { case (file 1 Byte, file 2 Byte) => file 2 Byte < file 1 Byte } // take the first 5 results and load them into memory val fiveBytes = zipped.take( 5 ).force // for debug in REPL lets print them out fiveBytes mkString "," // Add a line number to each line in a file // // Note: Since methods in a Input object return LongTraversableView objects // all zip examples do not open the file. To do that you must call // force or some other method that forces a read to take place. val addedLineNumbers = file 1 .lines().zipWithIndex.map { case (line,idx) => idx+ " " +line } // print out second group of 5 lines addedLineNumbers.drop( 5 ).take( 5 ) foreach println // check if file 1 startsWith file 2 file 1 .bytes.startsWith(file 2 .bytes) // The number of consecutive lines starting at 0 containing < file 1 .lines().segmentLength( _ contains "<" , 0 ) // check if all lines in file1 are the same as in file2 ignoring case file 1 .lines().corresponds(file 2 .lines())( _ equalsIgnoreCase _ ) // Check if file1has the same bytes as file2 file 1 .bytes.sameElements(file 2 .bytes) // silly example but shows that value // being compared can be any traversable file 1 .bytes.sameElements( 1 to 30 ) // use sliding to visit each 1008 bytes. // map splits the window into two parts, block and checksum val blocks = file 1 .bytes.sliding( 1008 , 1008 ).map{ _ splitAt 1000 } // grouped is sliding(size,size) so the following is equivalent val blocks 2 = file 1 .bytes.grouped( 1008 ).map{ _ splitAt 1000 } blocks 2 foreach { case (block,checksum) => // verify checksum and process println(block take 5 ) } |
1 2 3 4 5 6 7 8 9 10 11 12 13 | import scalax.io. _ import java.net.URL /** * Skip first 10 bytes and sum a random number of bytes up * to 20 bytes */ in.bytes.drop( 10 ).take( 20 ).limitFold( 10 ) { case (acc, next) if util.Random.nextBoolean => End(acc + next) case (acc, next) => Continue(acc + next) } |
Labels:
daily-scala,
longtraversable,
Scala,
scala-io,
scala-io core
Thursday, August 2, 2012
Scala-IO Core: Resource, Input
Just a note: all these examples have been tested in REPL so go ahead and fire up the sbt console in the example project and try these out.
Resource is the fundamental component of Scala-IO. A Resource is essentially anything that has a simple open/close lifecycle. The Resource trait handles the lifecycle for the developer allowing him to focus on the IO logic.
In the typical use-case one of the Resource subclasses will be used. They are more useful in general because they will have one of higher level traits mixed in like Input or Output.
The most typical way to create a Resource is with the Resource object which is a factory method for creating Resource objects from various types of Java objects.
While Resource is the foundation Trait, Input and Output are the Traits most commonly used, The user-facing traits if you will.
Here are a few examples of creating Resources:
There are advanced usages of Resource that we will get into in later posts. At the moment I want to focus on Input, Output and Seekable Traits. In later posts we will look at how to integrate with legacy Java APIs and how to access the underlying resource using the loan pattern.
The Input Trait provides methods for accessing the data of the underlying resource in various different way. As bytes, strings, lines, etc...
There are two basic types of methods. Methods that return LongTraversable objects and methods that load the entire Resource into memory. For example:
What is a LongTraversable? That will be the next post :-). Summarized, it is a specialized Lazy/non-strict Traversable.
Resource
Resource is the fundamental component of Scala-IO. A Resource is essentially anything that has a simple open/close lifecycle. The Resource trait handles the lifecycle for the developer allowing him to focus on the IO logic.
In the typical use-case one of the Resource subclasses will be used. They are more useful in general because they will have one of higher level traits mixed in like Input or Output.
The most typical way to create a Resource is with the Resource object which is a factory method for creating Resource objects from various types of Java objects.
While Resource is the foundation Trait, Input and Output are the Traits most commonly used, The user-facing traits if you will.
Here are a few examples of creating Resources:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import scalax.io.Resource Resource.fromFile( "file" ) Resource.fromFile( new java.io.File( "file" )) Resource.fromRandomAccessFile( new java.io.RandomAccessFile( "file" , "rw" )) Resource.fromByteChannel( new java.io.FileInputStream( "file" ).getChannel) Resource.fromReadableByteChannel( new java.io.FileInputStream( "file" ).getChannel) Resource.fromWritableByteChannel( new java.io.FileInputStream( "file" ).getChannel) Resource.fromInputStream( new java.io.FileInputStream( "file" )) Resource.fromOutputStream( new java.io.FileOutputStream( "file" )) Resource.fromReader( new java.io.FileReader( "file" )) Resource.fromWriter( new java.io.FileWriter( "file" )) Resource.fromClasspath( "scalax/io/Resource.class" ) Resource.fromClasspath( "scalax/io/Resource.class" , classOf[Resource[ _ ]]) |
Input
The Input Trait provides methods for accessing the data of the underlying resource in various different way. As bytes, strings, lines, etc...
There are two basic types of methods. Methods that return LongTraversable objects and methods that load the entire Resource into memory. For example:
string
and byteArray
load the entire resource into memory while bytes
and chars
return a LongTraversable.What is a LongTraversable? That will be the next post :-). Summarized, it is a specialized Lazy/non-strict Traversable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | import scalax.io. _ // The simplest way to read data is to get the bytes from an Input object val bytes : LongTraversable[Byte] = input.bytes // you can also get the characters and strings from an Input object but you need a codec for decoding the bytes val chars : LongTraversable[Char] = input.chars(Codec.UTF 8 ) // The default encoding is UTF-8 so one can leave of the codec if desired val defaultChars = input.chars // Notice that the () are left off input.chars. That is because the codec // parameter is implicit. This allows the codec to be defined once // and be used by all chars calls implicit val codec = Codec.ISO 8859 val iso 8859 chars = input.chars // The read method in the java InputStream API returns the bytes // as an integer. For the similar behaviour you can call bytesAsInts val bytesAsInts = input.bytesAsInts // There are two useful methods for loading all data into memory val string = input.string val array = input.byteArray // We will revisit it in more detail later but there is an efficient copy method // for copying the data from the Input object to any Output // The copy method detects type of the Input Object and the Output // object and intelligently chooses a method for copying that is // as efficient as possible. // For example if the Input is based on a FileInputStream and // Output is based on a FileOutputStream the FileChannel copyTo // method is used so that the OS will perform the copy // efficiently input copyDataTo Resource.fromFile( "file" ) // Lines Example 1 // Another useful method is the lines() method which more or less // does what it implies. The default behaviour will be to Auto // detect the line ending. For efficiency it finds the end of the first // line and then assumes the rest of the file has the same ending. // The terminator is removed by default. input.lines() // Lines Example 2 // This example explicitly declares the EOL terminator and // requests that the terminator is not stripped from the line import Line.Terminators. _ input.lines(terminator = NewLine, includeTerminator = true ) // Lines Example 3 // Lastly lines is not limited to just the standard line endings // arbitrary separators can be used. This last example // demonstrates parsing the input into lines using %% as the // EOL terminator. Additionally it shows the explicit use // of the Codec to override the in scope implicit Codec input.lines(terminator = Custom( "%%" ))(Codec.UTF 8 ) |
Labels:
daily-scala,
input,
longtraversable,
resource,
Scala,
scala-io,
scala-io core
Subscribe to:
Posts (Atom)