HXT:输入是否可以使用箭头语法更改?

时间:2014-03-27 23:19:51

标签: xml haskell xml-parsing arrows hxt

使用以下代码

{-# LANGUAGE Arrows #-}
{-# LANGUAGE NoMonomorphismRestriction #-}
import Text.XML.HXT.Core

parseXml :: IOSArrow XmlTree XmlTree
parseXml = getChildren >>> getChildren >>>
  proc x -> do
    y <- x >- hasName "item"
    returnA -< x

main :: IO ()
main = do
    person <- runX (readString [withValidate no]
                    "<xml><item>John</item><item2>Smith</item2></xml>"
                    >>> parseXml)
    putStrLn $ show person
    return ()

我得到了输出

[NTree (XTag "item" []) [NTree (XText "John") []]]

所以似乎hasName "item"已应用于x,我没想到。使用arrowp我获得parseXml

parseXml
   = getChildren >>> getChildren >>>
      (arr (\ x -> (x, x)) >>>
         (first (hasName "item") >>> arr (\ (y, x) -> x)))

所以我有箭头图

                                                       y
                                   /-- hasName "item" ---
                               x  /                       
-- getChildren -- getChildren ---\x->(x,x)              \(y,x)->x --- final result
                                  \                       / 
                                   \---------------------/  

为什么hasName "item"也适用于元组的第二位?我认为haskell中没有状态,hasName "item" x返回一个新对象,而不是更改x的内部状态。

相关问题:Is factoring an arrow out of arrow do notation a valid transformation?

我原来的问题

我有以下代码:

{-# LANGUAGE Arrows #-}
import Text.XML.HXT.Core

data Person = Person { forname :: String, surname :: String } deriving (Show)

parseXml :: IOSArrow XmlTree Person
parseXml = proc x -> do
    forname <- x >- this /> this /> hasName "fn" /> getText
    surname <- x >- this /> this /> hasName "sn" /> getText
    returnA -< Person forname surname

main :: IO ()
main = do
    person <- runX (readString [withValidate no]
                               "<p><fn>John</fn><sn>Smith</sn></p>"
                    >>> parseXml)
    putStrLn $ show person
    return ()

如果我运行它一切正常,我得到输出

[Person {forname = "John", surname = "Smith"}]

但是,如果我更改parseXml以避免this语句

parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>> proc x -> do
    forname <- x >- hasName "fn" /> getText
    surname <- x >- hasName "sn" /> getText
    returnA -< Person forname surname

不再可以解析任何人(输出为[])。

调查问题
parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>>
  proc x -> do
    forname <- x >- withTraceLevel 5 traceTree >>> hasName "fn" /> getText
    surname <- x >- hasName "sn" /> getText
    returnA -< Person forname surname

我得到了输出

content of: 
============

---XTag "fn"
   |
   +---XText "John"



content of: 
============

---XTag "sn"
   |
   +---XText "Smith"


[]

所以一切似乎都很好,但代码

parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>>
  proc x -> do
    forname <- x >- hasName "fn" /> getText
    surname <- x >- withTraceLevel 5 traceTree >>> hasName "sn" /> getText
    returnA -< Person forname surname

我得到了

content of: 
============

---XTag "fn"
   |
   +---XText "John"


[]

所以在我看来,输入x的值在两个语句之间发生了变化。在将hasName "fn"附加到x箭头之前,surname似乎已应用于x。但{{1}}两条线之间的保持不变?

2 个答案:

答案 0 :(得分:2)

不,输入不能改变,也不能改变。

您在行中编写的内容

proc x -> do
  y <- x >- hasName "item"
  returnA -< x

只是一个过滤器,删除了所有未命名为item的节点。 他相当于箭头

hasName "item" `guards` this

您可以使用

进行测试
{-# LANGUAGE Arrows #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

module Main where

import Text.XML.HXT.Core

parseXml0 :: IOSArrow XmlTree XmlTree
parseXml0 = getChildren >>> getChildren >>>
  proc x -> do
    _ <- hasName "item" -< x
    returnA -< x

parseXml1 :: IOSArrow XmlTree XmlTree
parseXml1 = getChildren >>> getChildren >>>
            (hasName "item" `guards` this)

main1 :: Show c => IOSArrow XmlTree c -> IO ()
main1 parseXml = do
    person <- runX (readString [withValidate no]
                    "<xml><item>John</item><item2>Smith</item2></xml>"
                    >>> parseXml)
    putStrLn $ show person
    return ()

main :: IO ()
main = main1 parseXml0 >> main1 parseXml1

答案 1 :(得分:1)

编辑:好的,现在你已经完成了改变你的问题!

工作示例应解释如下:

对于顶级代码x

  • 遍历名称为getTextthis /> this)的孙子("fn")的所有文本(hasName "fn"),使用forname来保存这些值
  • 遍历名称为getTextthis /> this)的孙子("sn")的所有文本(hasName "sn"),使用surname来保存这些值
  • 每对此类产品Person forname surname

这看起来很有效,但可能没有做你认为它正在做的事情。例如,尝试在输入"<p><fn>John</fn><sn>Smith</sn><fn>Anne</fn><sn>Jones</sn></p>"上运行代码。打印出四个名字。

破碎的例子应解释如下:

每个孙子x

  • 如果x的名称为"fn",则将文字存储在forname中(否则请跳至下一个x
  • 如果x的名称为"sn",则将文字存储在surname中(否则请跳至下一个x

标记的名称"fn" 名称不能为"sn"!因此,每个标签都被跳过。

您的调查只是显示跳过标记的计算点。在第一种情况下,两个标签都存在,因为尚未过滤任何内容。在第二种情况下,仅存在"fn"标记,因为第一个命令已将其他所有内容过滤掉。

编辑:你可能会发现这个例子(以列表monad的形式完成)是有启发性的。

import Control.Monad ((>=>))

data XML = Text String | Tag String [XML] deriving Show

this :: a -> [a]
this = return

(/>) :: (a -> [XML]) -> (XML -> [c]) -> a -> [c]
f /> g = f >=> getChildren >=> g

(>--) :: a -> (a -> b) -> b
x >-- f = f x

getChildren :: XML -> [XML]
getChildren (Text _) = []
getChildren (Tag _ c) = c

hasName :: String -> XML -> [XML]
hasName _ (Text _) = []
hasName n i@(Tag n' _) = if n == n' then [i] else []

getText :: XML -> [String]
getText (Text t) = [t]
getText (Tag _ _) = []

parseXML :: XML -> [(String, String)]
parseXML = \x -> do
  forname <- x >-- (this /> this /> hasName "fn" /> getText)
  surname <- x >-- (this /> this /> hasName "sn" /> getText)
  return (forname, surname)

parseXMLBroken :: XML -> [(String, String)]
parseXMLBroken = getChildren >=> getChildren >=> \x -> do
  forname <- x >-- (hasName "fn" /> getText)
  surname <- x >-- (hasName "sn" /> getText)
  return (forname, surname)

runX :: (XML -> a) -> XML -> a
runX f xml = f (Tag "/" [xml])

xml :: XML
xml = (Tag "p" [ Tag "fn" [Text "John"]
               , Tag "sn" [Text "Smith"] ])

example1 = runX parseXML xml

example2 = runX parseXMLBroken xml

*Main> example1
[("John","Smith")]
*Main> example2
[]