我正试图了解HXT,一个用于解析使用箭头的XML的Haskell库。对于我的特定用例,我宁愿不使用deep
,因为有些情况<outer_tag><payload_tag>value</payload_tag></outer_tag>
与<outer_tag><inner_tag><payload_tag>value</payload_tag></inner_tag></outer_tag>
不同但我遇到了一些奇怪的感觉,它应该有效,但不会
我设法根据文档中的this example提出了一个测试用例:
{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
module Main where
import Text.XML.HXT.Core
data Guest = Guest { firstName, lastName :: String }
deriving (Show, Eq)
getGuest = deep (isElem >>> hasName "guest") >>>
proc x -> do
fname <- getText <<< getChildren <<< deep (hasName "fname") -< x
lname <- getText <<< getChildren <<< deep (hasName "lname") -< x
returnA -< Guest { firstName = fname, lastName = lname }
getGuest' = deep (isElem >>> hasName "guest") >>>
proc x -> do
fname <- getText <<< getChildren <<< (hasName "fname") <<< getChildren -< x
lname <- getText <<< getChildren <<< (hasName "lname") <<< getChildren -< x
returnA -< Guest { firstName = fname, lastName = lname }
getGuest'' = deep (isElem >>> hasName "guest") >>> getChildren >>>
proc x -> do
fname <- getText <<< getChildren <<< (hasName "fname") -< x
lname <- getText <<< getChildren <<< (hasName "lname") -< x
returnA -< Guest { firstName = fname, lastName = lname }
driver finalArrow = runX (readDocument [withValidate no] "guestbook.xml" >>> finalArrow)
main = do
guests <- driver getGuest
print "getGuest"
print guests
guests' <- driver getGuest'
print "getGuest'"
print guests'
guests'' <- driver getGuest''
print "getGuest''"
print guests''
在getGuest
和getGuest'
之间,我将deep
扩展为正确的getChildren
。结果函数仍然有效。然后我将getChildren
置于do
块之外,但这会导致生成的函数失败。输出是:
"getGuest"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest'"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest''"
[]
我觉得这应该是一个有效的转换,但我对箭的理解有点不稳定。难道我做错了什么?这是我应该报告的错误吗?
我正在使用HXT版本9.3.1.3(撰写本文时的最新版本)。 ghc --version打印“Glorious Glasgow Haskell编译系统,版本7.4.1”。我还在ghc 7.6.3的盒子上进行了测试,得到了相同的结果。
XML文件具有以下重复结构(可以找到完整文件here)
<guestbook>
<guest>
<fname>John</fname>
<lname>Steinbeck</lname>
</guest>
<guest>
<fname>Henry</fname>
<lname>Ford</lname>
</guest>
<guest>
<fname>Andrew</fname>
<lname>Carnegie</lname>
</guest>
</guestbook>
答案 0 :(得分:3)
在getGuest''
你有
... (hasName "fname") -< x
... (hasName "lname") -< x
也就是说,您限制x
为"fname"
且 x
为"lname"
的情况,但不满意任何x
!
答案 1 :(得分:2)
我已经设法解决了构造被解释的具体原因。找到以下箭头翻译here为
提供了工作基础addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = proc x -> do
y <- f -< x
z <- g -< x
returnA -< y + z
变为:
addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = arr (\ x -> (x, x)) >>>
first f >>> arr (\ (y, x) -> (x, y)) >>>
first g >>> arr (\ (z, y) -> y + z)
通过类比,我们可以从中得出:
getGuest''' = preproc >>>
arr (\ x -> (x, x)) >>>
first f >>> arr (\ (y, x) -> (x, y)) >>>
first g >>> arr (\ (z, y) -> Guest {firstName = z, lastName = y})
where preproc = deep (isElem >>> hasName "guest") >>> getChildren
f = getText <<< getChildren <<< (hasName "fname")
g = getText <<< getChildren <<< (hasName "lname")
在HXT中,箭头可以想象为流过滤镜的值流。正如我所希望的,arr (\x->(x,x))
并没有“分流”。相反,它创建了一个由f
过滤的元组流,幸存者按g
过滤。由于f
和g
是互斥的,因此没有幸存者。
getChildren
里面的例子奇迹般地起作用,因为元组流包含来自XML文档的更多内容的值,如
<guest>
<fname>John</fname>
<lname>Steinbeck</lname>
</guest>
所以不是互相排斥的。