将xml放入哈希表中

时间:2013-06-22 22:50:24

标签: haskell xml-parsing hashtable

我试图将xml文件中的信息放入查找表中。 到目前为止,我一直在阅读可用的库,以及如何使用它们。 我选择了hxt和hashtables。 这是文件:

<?xml version="1.0" encoding="UTF-8" ?>

<tables>

  <table name="nametest1">
    test1
  </table>

  <table name="nametest2">
    test2
  </table>

</tables>

我想有以下几双:
nametest1,test1
nametest2,test2
等...

-- | We get the xml into a hash
getTables :: IO (H.HashTable String String)
getTables = do
  confPath <- getEnv "ENCODINGS_XML_PATH"
  doc      <- runX $ readDocument [withValidate no] confPath
  -- this is the part I don't have
  -- I get the whole hashtable create and insert process
  -- It is the get the xml info that is blocking
  where -- I think I might use the following so I shamelessly took them from the net
    atTag tag = deep (isElem >>> hasName tag)
    text      = getChildren >>> getText

我看到很多关于如何做类似事情的例子,但我无法弄清楚如何在每个节点获取name属性。

干杯, rakwatt

1 个答案:

答案 0 :(得分:1)

这是一个示例,它读取名为test.xml的文件,并打印出(名称,文本)对:

import           Text.XML.HXT.Core

-- | Gets the name attribute and the content of the selected items as a pair
getAttrAndText :: (ArrowXml a) => a XmlTree (String, String)
getAttrAndText =
      getAttrValue "name"             -- And zip it together with the the attribute name
  &&& deep getText                    -- Get the text of the node


-- | Gets all "table" items under a root tables item
getTableItem :: (ArrowXml a) => a XmlTree XmlTree
getTableItem =
      deep (hasName "tables")          -- Find a tag <tables> anywhere in the document
  >>> getChildren                      -- Get all children of that tag
  >>> hasName "table"                  -- Filter those that have the tag <table>
  >>> hasAttr "name"                   -- Filter those that have an attribute name

-- | The main function
main = (print =<<) $ runX $                       -- Print the result
      readDocument [withValidate no] "test.xml"   -- Read the document
  >>> getTableItem                                -- Get all table items
  >>> getAttrAndText                              -- Get the attribute 'name' and the text of those nodes

对的构造发生在getAttrAndText中。其余的函数只是打开文件并选择所有标记的直接子标记。您仍然可能想要删除文本中的前导空格。