多次处理一个XML文档

时间:2013-02-24 17:21:24

标签: haskell hxt

Haskell软件包hxt的使用对我来说仍然有点奇怪。尤其是箭头符号和结果类型是一种魔力。

到目前为止我无法管理以下内容:我想处理一个主要包含两部分的XML文件。一个保持对象的定义,第二个保持对象的用途/目的。首先,我想编写一些hxt处理来获取part1上的Haskell数据结构,在该过程第2部分之后,最后将两个数据结构组合在程序的实际逻辑中。

一般来说,处理文件现在很好,谢谢the arrows tutorial。但是我现在想要做三个步骤:读取文档(懒惰),用第一个处理器处理结果结构一次,然后再用第二个处理器处理相同的结构。我不想要的是两次调用“readDocument”,如下例所示。

import Text.XML.HXT.Core
import Data.Char(toUpper)
import Data.Tree.NTree.TypeDefs

play filename = do 
                  results <- runX (getAllAddresses filename) 
                  results2 <- runX (getAllAddressesUsages filename) 
                  print results 
                  print results2 



getAllAddresses :: FilePath -> IOSArrow XmlTree [(String,NTree XNode)]
getAllAddresses filename =
    readDocument [withValidate no] filename >>>
    getChildren >>>
    isElem >>> hasName "main" >>>
    getChildren >>>
    isElem >>> hasName "part1" >>>
    getChildren >>>
    isElem >>> hasName "address" >>>
    listA(getAddress)                 -- create a list for each variable, so use listA



getAddress :: IOSArrow XmlTree (String,NTree XNode)
getAddress =
    getChildren >>>
    isElem >>>
         (
          neg ( hasName "location") >>>   -- all elements being no "location"
          getName &&& (getChildren)       -- get the name and the value for each element
         ) 
    <+>     
    ( 
      hasName "location" >>>              -- work on all nodes within the  "location" subcontainer
      getChildren >>> 
      isElem >>>
      ( getName &&& (getChildren) )       -- get the name and the value for each element
     )




getAllAddressesUsages :: FilePath -> IOSArrow XmlTree [(String,NTree XNode)]
getAllAddressesUsages filename =
    readDocument [withValidate no] filename >>>
    getChildren >>>
    isElem >>> hasName "main" >>>
    getChildren >>>
    isElem >>> hasName "part2" >>>
    getChildren >>>
    listA(getAddressUsagePurpose2)                 -- create a list for each variable, so use listA

getAddressUsagePurpose2 :: IOSArrow XmlTree (String,NTree XNode)
getAddressUsagePurpose2 =
    hasName "use_obj-names_for_purpose_2" >>>            -- work on all nodes with usage 2
    ( getName &&& (getChildren) )                        -- get the name and the value for each element

示例数据:

<main>
 <part1>
  <address>
    <obj-name>one</obj-name>
    <name>peter 1</name>
    <street>streetname 1</street>
    <location>
      <country>Germany</country>
      <state>Baden Wuerttemberg</state>
   </location>
   </address>
  <address>
    <obj-name>two</obj-name>
    <name>peter 2</name>
    <street>streetname 2</street>
    <location>
      <country>Germany</country>
      <state>Nordrhein Westfalen</state>
      </location>
   </address>
 </part1>
 <part2>
   <use_obj-names_for_purpose_1>
     <obj-name>two</obj-name>
   </use_obj-names_for_purpose_1>
   <use_obj-names_for_purpose_2>
     <obj-name>two</obj-name>
   </use_obj-names_for_purpose_2>
 </part2>
</main>

所以正式的问题是:

为了得到这样的东西,monadic在函数游戏中的表现如何:

readXmlDocument :: String -> IOSArrow XmlTree (NTree XNode)
readXmlDocument filename = readDocument [withValidate no] filename

play filename = do 
             document <- readXmlDocument filename
             allAddresses <- getAllAddresses document
             allPurposes <- getAllAddressesUsages document
             result <- processLogics allAddresses allPurposes 
             print result

如何从Monads转到Arrows,返回Monads,再转到普通数据并返回Monads。

为什么我这样做?

1 个答案:

答案 0 :(得分:1)

问题的一个解决方案如下:

使用箭头语言扩展并使用“proc”表达式处理在两个处理器路径中的一个函数中读取的文档。结果组合在一个元组中。这个元组仍然包含两个需要运行的箭头。这是通过runX函数的两个应用程序完成的。

一旦机器人结果在下面的计算中合并,我仍然不确切知道该构造是否加载了一两次文件。

{-# LANGUAGE Arrows #-}

import Text.XML.HXT.Core
import Data.Char(toUpper)
import Data.Tree.NTree.TypeDefs


play filename = (runX addresses, runX usages)
    where (addresses,usages)=(analyseXml (readXmlDocument filename))

analyseXml :: IOSArrow XmlTree (NTree XNode) -> (IOSArrow XmlTree [(String,NTree XNode)],IOSArrow XmlTree String)
analyseXml = proc document -> do 
               allAddresses <- getAllAddresses -< document
               allUsages <- getAllAddressesUsages -< document
               returnA -< (allAddresses,allUsages)

readXmlDocument :: String -> IOSArrow XmlTree (NTree XNode)
readXmlDocument filename = readDocument [withValidate no] filename



getAllAddresses :: IOSArrow XmlTree (NTree XNode) -> IOSArrow XmlTree [(String,NTree XNode)]
getAllAddresses document =
    document >>>
    getChildren >>>
    isElem >>> hasName "main" >>>
    getChildren >>>
    isElem >>> hasName "part1" >>>
    getChildren >>>
    isElem >>> hasName "address" >>>
    listA(getAddress)                 -- create a list for each variable, so use listA



getAddress :: IOSArrow XmlTree (String,NTree XNode)
getAddress =
    getChildren >>>
    isElem >>>
         (
          neg ( hasName "location") >>>   -- all elements being no "location"
          getName &&& (getChildren)       -- get the name and the value for each element
         ) 
    <+>     
    ( 
      hasName "location" >>>              -- work on all nodes within the  "location" subcontainer
      getChildren >>> 
      isElem >>>
      ( getName &&& (getChildren) )       -- get the name and the value for each element
     )




getAllAddressesUsages :: IOSArrow XmlTree (NTree XNode) -> IOSArrow XmlTree String
getAllAddressesUsages document =
    document >>>
    getChildren >>>
    isElem >>> hasName "main" >>>
    getChildren >>>
    isElem >>> hasName "part2" >>>
    getChildren >>>
    isElem >>> hasName "use_obj-names_for_purpose_2" >>>
    getChildren >>>
    isElem >>> hasName "obj-name" >>>
    getChildren >>>
    getText                 -- create a list with objects for each short-name. So use listA

执行可以按如下方式进行:

*Main>  snd ( play  "../tmp/haskell/test.xml")
["two"]

*Main>  fst ( play  "../tmp/haskell/test.xml")
[[("obj-name",NTree (XText "one") []),("name",NTree (XText "peter 1") []),("street",NTree (XText "streetname 1") []),("country",NTree (XText "Germany") []),("state",NTree (XText "Baden Wuerttemberg") [])],[("obj-name",NTree (XText "two") []),("name",NTree (XText "peter 2") []),("street",NTree (XText "streetname 2") []),("country",NTree (XText "Germany") []),("state",NTree (XText "Nordrhein Westfalen") [])]]
*Main>