我正在从http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html
解析修改后的XML这是它的样子:
<?xml version="1.0" encoding="utf-8"?>
<population xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com">
<success>true</success>
<row_count>2</row_count>
<summary>
<bananas>0</bananas>
</summary>
<people>
<person>
<firstname>Michael</firstname>
<age>25</age>
</person>
<person>
<firstname>Eliezer</firstname>
<age>2</age>
</person>
</people>
</population>
如何为每个人提供firstname
和age
的列表?
我的目标是使用http-conduit下载这个xml然后解析它,但我正在寻找一个解决方案,解决在没有属性时如何解析(使用tagNoAttrs?)
这是我尝试过的,我在Haskell评论中添加了我的问题:
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource
import Data.Conduit (($$))
import Data.Text (Text, unpack)
import Text.XML.Stream.Parse
import Control.Applicative ((<*))
data Person = Person Int Text
deriving Show
-- Do I need to change the lambda function \age to something else to get both name and age?
parsePerson = tagNoAttr "person" $ \age -> do
name <- content -- How do I get age from the content? "unpack" is for attributes
return $ Person age name
parsePeople = tagNoAttr "people" $ many parsePerson
-- This doesn't ignore the xmlns attributes
parsePopulation = tagName "population" (optionalAttr "xmlns" <* ignoreAttrs) $ parsePeople
main = do
people <- runResourceT $
parseFile def "people2.xml" $$ parsePopulation
print people
答案 0 :(得分:9)
首先:解析xml-conduit中的组合器并在很长一段时间内更新,并显示它们的年龄。我建议大多数人使用DOM或游标界面。那就是说,让我们看看你的例子。您的代码存在两个问题:
http://example.com
命名空间中,您的代码需要反映出来。所以这是使用流式API获得所需结果的实现:
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource (runResourceT)
import Data.Conduit (Consumer, ($$))
import Data.Text (Text)
import Data.Text.Read (decimal)
import Data.XML.Types (Event)
import Text.XML.Stream.Parse
data Person = Person Int Text
deriving Show
-- Do I need to change the lambda function \age to something else to get both name and age?
parsePerson :: MonadThrow m => Consumer Event m (Maybe Person)
parsePerson = tagNoAttr "{http://example.com}person" $ do
name <- force "firstname tag missing" $ tagNoAttr "{http://example.com}firstname" content
ageText <- force "age tag missing" $ tagNoAttr "{http://example.com}age" content
case decimal ageText of
Right (age, "") -> return $ Person age name
_ -> force "invalid age value" $ return Nothing
parsePeople :: MonadThrow m => Consumer Event m [Person]
parsePeople = force "no people tag" $ do
_ <- tagNoAttr "{http://example.com}success" content
_ <- tagNoAttr "{http://example.com}row_count" content
_ <- tagNoAttr "{http://example.com}summary" $
tagNoAttr "{http://example.com}bananas" content
tagNoAttr "{http://example.com}people" $ many parsePerson
-- This doesn't ignore the xmlns attributes
parsePopulation :: MonadThrow m => Consumer Event m [Person]
parsePopulation = force "population tag missing" $
tagName "{http://example.com}population" ignoreAttrs $ \() -> parsePeople
main :: IO ()
main = do
people <- runResourceT $
parseFile def "people2.xml" $$ parsePopulation
print people
这是使用游标API的示例。请注意,它具有不同的错误处理特性,但应该为格式良好的输入生成相同的结果。
{-# LANGUAGE OverloadedStrings #-}
import Text.XML
import Text.XML.Cursor
import Data.Text (Text)
import Data.Text.Read (decimal)
import Data.Monoid (mconcat)
main :: IO ()
main = do
doc <- Text.XML.readFile def "people2.xml"
let cursor = fromDocument doc
print $ cursor $// element "{http://example.com}person" >=> parsePerson
data Person = Person Int Text
deriving Show
parsePerson :: Cursor -> [Person]
parsePerson c = do
let name = c $/ element "{http://example.com}firstname" &/ content
ageText = c $/ element "{http://example.com}age" &/ content
case decimal $ mconcat ageText of
Right (age, "") -> [Person age $ mconcat name]
_ -> []