Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Thursday, December 10, 2009

XML transformations 2

This topic is a continuation of XML Transformations. The previous topic showed a method of creating transformation rules then combining the rules to create a transformation that could be applied to an XML datastructure. This topic takes a different approach of
using a match statement an a recursive method to iterate through the tree.
  1. scala> val xml = <library>
  2.      | <videos>
  3.      | <video type="dvd">Seven</video>
  4.      | <video type="blue-ray">The fifth element</video>
  5.      | <video type="hardcover">Gardens of the moon</video>
  6.      | </videos>
  7.      | <books>
  8.      | <book type="softcover">Memories of Ice</book>
  9.      | </books>
  10.      | </library>
  11. xml: scala.xml.Elem = 
  12. <library>
  13.        <videos>
  14.        <video type="dvd">Seven</video>
  15.        <video type="blue-ray">The fifth element</video>
  16.        <video type="hardcover">Gardens of the moon</video>
  17.        </videos>
  18.        <books>
  19.        <book type="softcover">Memories of Ice</book>
  20.        </books>
  21.        </library>
  22. scala> import scala.xml._
  23. import scala.xml._
  24. scala> def moveElements (node:Node) : Node = node match {
  25.      | case n:Elem if (n.label == "videos") => 
  26.      |   n.copy( child = n.child diff mislabelledBooks)
  27.      | case n:Elem if (n.label == "books") =>
  28.      |   val newBooks = mislabelledBooks map { e => e.asInstanceOf[Elem].copy(label="book") }
  29.      |   n.copy( child = n.child ++ newBooks)
  30.      | case n:Elem => 
  31.      |   val children = n.child map {moveElements _}
  32.      |   n.copy(child = children)
  33.      | case n => n
  34.      | }
  35. moveElements: (node: scala.xml.Node)scala.xml.Node
  36. scala> moveElements(xml)
  37. res1: scala.xml.Node = 
  38. <library>
  39.               <videos>
  40.               <video type="dvd">Seven</video>
  41.               <video type="blue-ray">The fifth element</video>
  42.               
  43.               </videos>
  44.               <books>
  45.               <book type="softcover">Memories of Ice</book>
  46.               <book type="hardcover">Gardens of the moon</book></books>
  47.               </library>

Xml Transformation 1

Unlike most Java Xml Apis the Scala Object model consists of immutable object. This has two major consequences:
  • There is no reference to the parent node because that would cause the XML to be very expensive during transformations
  • Transforming the XML requires creating new nodes rather than changing the existing nodes

Both point cause non-functional programmers to feel a little uneasy but in practice only the first restriction causes any real discomfort.

Two methods for XML transformation will be demonstrated in this and the next topic.
  1. scala> val xml = <library>
  2.      | <videos>
  3.      | <video type="dvd">Seven</video>
  4.      | <video type="blue-ray">The fifth element</video>
  5.      | <video type="hardcover">Gardens of the moon</video>
  6.      | </videos>
  7.      | <books>
  8.      | <book type="softcover">Memories of Ice</book>
  9.      | </books>
  10.      | </library>
  11. xml: scala.xml.Elem = 
  12. <library>
  13.        <videos>
  14.        <video type="dvd">Seven</video>
  15.        <video type="blue-ray">The fifth element</video>
  16.        <video type="hardcover">Gardens of the moon</video>
  17.        </videos>
  18.        <books>
  19.        <book type="softcover">Memories of Ice</book>
  20.        </books>
  21.        </library>
  22. scala> import scala.xml._
  23. import scala.xml._
  24. scala> import scala.xml.transform._
  25. import scala.xml.transform._
  26. // Some of the books are labelled as videos
  27. // not books so lets select those elements
  28. scala> val mislabelledBooks = xml \\ "video" filter {e => (e \\ "@type").text == "hardcover"}
  29. mislabelledBooks: scala.xml.NodeSeq = <video type="hardcover">Gardens of the moon</video>
  30. // we can create a rule that will remove all the
  31. // selected elements
  32. scala> object RemoveMislabelledBooks extends RewriteRule {
  33.      | override def transform(n: Node): Seq[Node] ={ 
  34.      | if (mislabelledBooks contains n) Array[Node]()
  35.      | else n
  36.      | }
  37.      | }
  38. defined module RemoveMislabelledBooks
  39. // a quick test to make sure the elements are removed
  40. scala> new RuleTransformer(RemoveMislabelledBooks)(xml)
  41. res1: scala.xml.Node = 
  42. <library>
  43.        <videos>
  44.        <video type="dvd">Seven</video>
  45.        <video type="blue-ray">The fifth element</video>
  46.        
  47.        </videos>
  48.        <books>
  49.        <book type="softcover">Memories of Ice</book>
  50.        </books>
  51.        </library>
  52. // Now another rule to add them back
  53. scala> object AddToBooks extends RewriteRule {                             
  54.      | override def transform(n: Node): Seq[Node] = n match {                                
  55.      | case e:Elem if(e.label == "books") =>                                                 
  56.      |   val newBooks = mislabelledBooks map { case e:Elem => e.copy(label="book") }
  57.      |   e.copy(child = e.child ++ newBooks)                                                 
  58.      | case _ => n
  59.      | }
  60.      | }
  61. defined module AddToBooks
  62. // voila done
  63. scala> new RuleTransformer(RemoveMislabelledBooks, AddToBooks)(xml) 
  64. res4: scala.xml.Node = 
  65. <library>
  66.        <videos>
  67.        <video type="dvd">Seven</video>
  68.        <video type="blue-ray">The fifth element</video>
  69.        </videos>
  70.        <books>
  71.        <book type="softcover">Memories of Ice</book>
  72.        <book type="hardcover">Gardens of the moon</book></books>
  73.        </library>

Tuesday, November 17, 2009

XML matching

The Scala XML support includes the ability to match on XML elements. Here are several examples. One of the most important parts to remember is to not miss the '{' and '}'.

  1. scala> <document>
  2.      |  <child1/>
  3.      |  <child2/>
  4.      | </document>
  5. res0: scala.xml.Elem =
  6. <document>
  7.         <child1></child1>
  8.         <child2></child2>
  9.        </document>
  10. scala> res0 match {
  11.        // match the document tag
  12.        // the {_*} is critical
  13.      | case <document>{_*}</document> => println("found document element")
  14.      | case _ => println("Found another element")
  15.      | }
  16. found document element
  17. scala> res0 match {
  18.        // assign the document element to e
  19.      | case e @ <document>{_*}</document> => println(e)
  20.      | case _ => println("Found another element")
  21.      | }
  22. <document>
  23.         <child1></child1>
  24.         <child2></child2>
  25.        </document>
  26. scala> res0 match {
  27.        // assign the children of document to children
  28.        // notice that there are Text elements that are part of children
  29.      | case <document>{children @ _*}</document> => println(children)
  30.      | case _ => println("Found another element")
  31.      | }
  32. ArrayBuffer(
  33.         , <child1></child1>,
  34.         , <child2></child2>,
  35.        )
  36. // the '\' is xpath like but only returns elements and attributes
  37. // in this case the \ "_" returns all element children of res0.  It
  38. // will not return the Text elements.
  39. scala> res0 \ "_" foreach {
  40.      | case <child1>{_*}</child1> => println("child1 found")
  41.      | case <child2>{_*}</child2> => println("child2 found")
  42.      | case e => println("found another element")
  43.      | }
  44. child1 found
  45. child2 found
  46. // another example of how \ does not return any text elements.  This returns
  47. // no elements
  48. scala> <doc>Hello</doc> \ "_" foreach { case scala.xml.Text(t) => println("a text element found: "+t) }
  49. // the .child returns all children of an Elem
  50. scala> <doc>Hello</doc>.child foreach { case scala.xml.Text(t) => println("a text element found: "+t) }
  51. a text element found: Hello
  52. // This example throws a match error because there are whitespace text elements
  53. // that cause the match to fail.
  54. scala> res0 match {                                                                                    
  55.      | case <document><child1/><child2/></document> => println("found the fragment")
  56.      | }
  57. scala.MatchError: <document>
  58.         <child1></child1>
  59.         <child2></child2>
  60.        </document>
  61.        at .< init>(< console>:6)
  62.        at .< clinit>(< console>)
  63.        at RequestResult$.< init>(< console>:3)
  64.        at RequestResult$.< clinit>(< console>)
  65.        at RequestResult$result(< console>)
  66.        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  67.        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMetho...
  68. // The trim method removes the whitespace nodes so now the 
  69. // match will work
  70. scala> scala.xml.Utility.trim(res0) match {                                                            
  71.      | case <document><child1/><child2/></document> => println("found the fragment")
  72.      | }
  73. found the fragment
  74. // you can select part of the tree using matching
  75. // child2 is assigned to 3 in this example.
  76. scala> scala.xml.Utility.trim(res0) match {                                           
  77.      | case <document><child1/>{e @ _*}</document> => println("found the fragment:"+e)
  78.      | }
  79. found the fragment:RandomAccessSeq(<child2></child2>)

Thursday, August 27, 2009

XPath Style XML Selection

The xml API in scala allows xpath like (although not true xpath) queries. In combination with matching this makes it very easy to process XML documents. I am only going to discuss xpath style selection now. The code section is very long but primarily because the results are often quite lengthy.
  1. scala>val address = <address>
  2.      | <CI_Address>
  3.      | <deliveryPoint>
  4.      | <CharacterString>Viale delle Terme di Caracalla
  5.      | </CharacterString>
  6.      | </deliveryPoint>
  7.      | <city>
  8.      | <CharacterString>Rome</CharacterString>
  9.      | </city>
  10.      | <administrativeArea>
  11.      | <CharacterString />
  12.      | </administrativeArea>
  13.      | <postalCode>
  14.      | <CharacterString>00153</CharacterString>
  15.      | </postalCode>
  16.      | <country>
  17.      | <CharacterString>Italy</CharacterString>
  18.      | </country>
  19.      | <electronicMailAddress>
  20.      | <CharacterString>jippe.hoogeveen@fao.org
  21.      | </CharacterString>
  22.      | </electronicMailAddress>
  23.      | </CI_Address>
  24.      | </address>
  25. address: scala.xml.Elem =
  26. <address>
  27.        <CI_Address>
  28.       ...
  29. // create a pretty printer for writing out the document nicely
  30. scala>  val pp = new scala.xml.PrettyPrinter(80, 5);
  31. pp: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@6d87c12a
  32. // select the city
  33. scala> println( pp.formatNodes( address \ "CI_Address" \ "city" ) )                   
  34. <city>
  35.      <gco:CharacterString>Rome</gco:CharacterString>
  36. </city>
  37. // a second way to select city
  38. scala> println( pp.formatNodes( address \\ "city" ) )      
  39. <city>
  40.      <gco:CharacterString>Rome</gco:CharacterString>
  41. </city>
  42. // select all characterStrings and print then one per line (unless there is a \n in the text)
  43. scala> (address \\ "CharacterString").mkString( "\n" )
  44. res2: String =
  45. <CharacterString>Viale delle Terme di Caracalla
  46.        </CharacterString>
  47. <CharacterString>Rome</CharacterString>
  48. <CharacterString></CharacterString>
  49. <CharacterString>00153</CharacterString>
  50. <CharacterString>Italy</CharacterString>
  51. <CharacterString>jippe.hoogeveen@fao.org
  52.        </CharacterString>
  53. // iterate over the city node and all of its child nodes.
  54. scala> println( pp.formatNodes( address \\ "city" \\ "_"))
  55. <city>
  56.      <CharacterString>Rome</CharacterString>
  57. </city><CharacterString>Rome</CharacterString>
  58. // similar as above but iterate over all CI_Address nodes and each of its children
  59. scala>println( pp.formatNodes( address \\ "CI_Address" \\ "_")) 
  60. <CI_Address>
  61.      <deliveryPoint>
  62.           <CharacterString>Viale delle Terme di Caracalla </CharacterString>
  63.      </deliveryPoint>
  64.      <city>
  65.           <CharacterString>Rome</CharacterString>
  66.      </city>
  67.      <administrativeArea>
  68.           <CharacterString></CharacterString>
  69.      </administrativeArea>
  70.      <postalCode>
  71.           <CharacterString>00153</CharacterString>
  72.      </postalCode>
  73.      <country>
  74.           <CharacterString>Italy</CharacterString>
  75.      </country>
  76.      <electronicMailAddress>
  77.           <CharacterString>jippe.hoogeveen@fao.org </CharacterString>
  78.      </electronicMailAddress>
  79. </CI_Address><deliveryPoint>
  80.      <CharacterString>Viale delle Terme di Caracalla </CharacterString>
  81. </deliveryPoint><CharacterString>Viale delle Terme di Caracalla </CharacterString><city>
  82.      <CharacterString>Rome</CharacterString>
  83. </city><CharacterString>Rome</CharacterString><administrativeArea>
  84.      <CharacterString></CharacterString>
  85. </administrativeArea><CharacterString></CharacterString><postalCode>
  86.      <CharacterString>00153</CharacterString>
  87. </postalCode><CharacterString>00153</CharacterString><country>
  88.      <CharacterString>Italy</CharacterString>
  89. </country><CharacterString>Italy</CharacterString><electronicMailAddress>
  90.      <CharacterString>jippe.hoogeveen@fao.org </CharacterString>
  91. </electronicMailAddress><CharacterString>jippe.hoogeveen@fao.org </CharacterString>
  92. // print all text
  93. scala> address.text                      
  94. res4: String =
  95.       
  96.       
  97.        Viale delle Terme di Caracalla
  98.       
  99.       
  100.       
  101.        Rome
  102.       
  103.       
  104.       
  105.       
  106.       
  107.        00153
  108.       
  109.       
  110.        Italy
  111.       
  112.       
  113.        jippe.hoogeveen@fao.org
  114.       
  115. // print all character string text
  116. scala> (address \\ "CharacterString").text            
  117. res3: String =
  118. Viale delle Terme di Caracalla
  119.        Rome00153Italyjippe.hoogeveen@fao.org
  120.       
  121. // print all character string text one per line
  122. scala> (address \\ "CharacterString").map( _.text ).mkString("\n")
  123. res6: String =
  124. Viale delle Terme di Caracalla
  125.       
  126. Rome
  127. 00153
  128. Italy
  129. jippe.hoogeveen@fao.org
  130. // find the longest character string
  131. scala> (address \\ "CharacterString").reduceRight(  
  132.      | (elem, longest) => {
  133.      | if( elem.text.length > longest.text.length ) elem
  134.      | else longest
  135.      | })
  136. res8: scala.xml.Node =
  137. <CharacterString>Viale delle Terme di Caracalla
  138.        </CharacterString>
  139. // find the alphabetically first characterstring
  140. scala> (address \\ "CharacterString").reduceRight( (elem, longest) => {
  141.      | if( elem.text > longest.text ) elem
  142.      | else longest
  143.      | })
  144. res9: scala.xml.Node =
  145. <CharacterString>jippe.hoogeveen@fao.org
  146.        </CharacterString>

Friday, August 7, 2009

Creating XML

Scala allows you to embed XML directly into a program and provides several ways to manipulate it. Today we will look at writing XML.
  1. scala>val xml = <root>
  2.      | <child>text</child>
  3.      | </root>
  4. xml: scala.xml.Elem =
  5. <root>
  6.        <child>text</child>
  7.        </root>
  8. scala>val data = "a string"
  9. data: java.lang.String = a string
  10. // you can embed logic and variables in the xml by surrounding with {}
  11. scala>val xml = <root>{data}</root>
  12. xml: scala.xml.Elem = <root>a string</root>
  13. scala>val xml = <root>{ for( i <- 1 to 3 ) yield {<xml i={i.toString}/>} } </root>
  14. xml: scala.xml.Elem = <root><xml i="1"></xml><xml i="2"></xml><xml i="3"></xml></root>
  15. scala>val xml = <root>{ for( i <- 1 to 3 ) yield<child>{i}</child> } </root>
  16. xml: scala.xml.Elem = <root><child>1</child><child>2</child><child>3</child></root>
  17. // save xml to file.  Note Scala 2.8 is changin save API
  18. // and will require the encoding and DocType information
  19. scala> scala.xml.XML.save( "/tmp/doc.xml", xml)