Tuesday, November 17, 2009

XML matching

The Scala XML support includes the ability to match on XML elements. Here are several examples. One of the most important parts to remember is to not miss the '{' and '}'.

  1. scala> <document>
  2.      |  <child1/>
  3.      |  <child2/>
  4.      | </document>
  5. res0: scala.xml.Elem =
  6. <document>
  7.         <child1></child1>
  8.         <child2></child2>
  9.        </document>
  10. scala> res0 match {
  11.        // match the document tag
  12.        // the {_*} is critical
  13.      | case <document>{_*}</document> => println("found document element")
  14.      | case _ => println("Found another element")
  15.      | }
  16. found document element
  17. scala> res0 match {
  18.        // assign the document element to e
  19.      | case e @ <document>{_*}</document> => println(e)
  20.      | case _ => println("Found another element")
  21.      | }
  22. <document>
  23.         <child1></child1>
  24.         <child2></child2>
  25.        </document>
  26. scala> res0 match {
  27.        // assign the children of document to children
  28.        // notice that there are Text elements that are part of children
  29.      | case <document>{children @ _*}</document> => println(children)
  30.      | case _ => println("Found another element")
  31.      | }
  32. ArrayBuffer(
  33.         , <child1></child1>,
  34.         , <child2></child2>,
  35.        )
  36. // the '\' is xpath like but only returns elements and attributes
  37. // in this case the \ "_" returns all element children of res0.  It
  38. // will not return the Text elements.
  39. scala> res0 \ "_" foreach {
  40.      | case <child1>{_*}</child1> => println("child1 found")
  41.      | case <child2>{_*}</child2> => println("child2 found")
  42.      | case e => println("found another element")
  43.      | }
  44. child1 found
  45. child2 found
  46. // another example of how \ does not return any text elements.  This returns
  47. // no elements
  48. scala> <doc>Hello</doc> \ "_" foreach { case scala.xml.Text(t) => println("a text element found: "+t) }
  49. // the .child returns all children of an Elem
  50. scala> <doc>Hello</doc>.child foreach { case scala.xml.Text(t) => println("a text element found: "+t) }
  51. a text element found: Hello
  52. // This example throws a match error because there are whitespace text elements
  53. // that cause the match to fail.
  54. scala> res0 match {                                                                                    
  55.      | case <document><child1/><child2/></document> => println("found the fragment")
  56.      | }
  57. scala.MatchError: <document>
  58.         <child1></child1>
  59.         <child2></child2>
  60.        </document>
  61.        at .< init>(< console>:6)
  62.        at .< clinit>(< console>)
  63.        at RequestResult$.< init>(< console>:3)
  64.        at RequestResult$.< clinit>(< console>)
  65.        at RequestResult$result(< console>)
  66.        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  67.        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMetho...
  68. // The trim method removes the whitespace nodes so now the 
  69. // match will work
  70. scala> scala.xml.Utility.trim(res0) match {                                                            
  71.      | case <document><child1/><child2/></document> => println("found the fragment")
  72.      | }
  73. found the fragment
  74. // you can select part of the tree using matching
  75. // child2 is assigned to 3 in this example.
  76. scala> scala.xml.Utility.trim(res0) match {                                           
  77.      | case <document><child1/>{e @ _*}</document> => println("found the fragment:"+e)
  78.      | }
  79. found the fragment:RandomAccessSeq(<child2></child2>)

2 comments:

  1. Be careful when matching with XML literals, as whitespace is significant. For this reason, Lift matches uses the XML case classes only.

    ReplyDelete
  2. This is absolutely correct and a good warning. IF (and it is a big if) you use matching with literals I highly recommend using the scala.xml.Utility.trim() method to assist with it. But matching with case classes is safer (and uglier :) )

    ReplyDelete