Question

我正在从http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html

解析修改后的XML

这是它的样子：

<?xml version="1.0" encoding="utf-8"?>
<population xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com">
  <success>true</success>
  <row_count>2</row_count>
  <summary>
    <bananas>0</bananas>
  </summary>
  <people>
      <person>
          <firstname>Michael</firstname>
          <age>25</age>
      </person>
      <person>
          <firstname>Eliezer</firstname>
          <age>2</age>
      </person>
  </people>
</population>

如何为每个人提供firstname和age的列表？

我的目标是使用http-conduit下载这个xml然后解析它，但我正在寻找一个解决方案，解决在没有属性时如何解析（使用tagNoAttrs？）

这是我尝试过的，我在Haskell评论中添加了我的问题：

{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource
import Data.Conduit (($$))
import Data.Text (Text, unpack)
import Text.XML.Stream.Parse
import Control.Applicative ((<*))

data Person = Person Int Text
        deriving Show

-- Do I need to change the lambda function \age to something else to get both name and age?
parsePerson = tagNoAttr "person" $ \age -> do
        name <- content  -- How do I get age from the content?  "unpack" is for attributes
        return $ Person age name

parsePeople = tagNoAttr "people" $ many parsePerson

-- This doesn't ignore the xmlns attributes
parsePopulation  = tagName "population" (optionalAttr "xmlns" <* ignoreAttrs) $ parsePeople

main = do
        people <- runResourceT $
             parseFile def "people2.xml" $$ parsePopulation
        print people

Answer 1

首先：解析xml-conduit中的组合器并在很长一段时间内更新，并显示它们的年龄。我建议大多数人使用DOM或游标界面。那就是说，让我们看看你的例子。您的代码存在两个问题：

它没有正确处理XML命名空间。所有元素名称都在http://example.com命名空间中，您的代码需要反映出来。
解析组合器要求您考虑所有元素。他们不会自动跳过某些元素。

所以这是使用流式API获得所需结果的实现：

{-# LANGUAGE OverloadedStrings #-}
import           Control.Monad.Trans.Resource (runResourceT)
import           Data.Conduit                 (Consumer, ($$))
import           Data.Text                    (Text)
import           Data.Text.Read               (decimal)
import           Data.XML.Types               (Event)
import           Text.XML.Stream.Parse

data Person = Person Int Text
        deriving Show

-- Do I need to change the lambda function \age to something else to get both name and age?
parsePerson :: MonadThrow m => Consumer Event m (Maybe Person)
parsePerson = tagNoAttr "{http://example.com}person" $ do
        name <- force "firstname tag missing" $ tagNoAttr "{http://example.com}firstname" content
        ageText <- force "age tag missing" $ tagNoAttr "{http://example.com}age" content
        case decimal ageText of
            Right (age, "") -> return $ Person age name
            _ -> force "invalid age value" $ return Nothing

parsePeople :: MonadThrow m => Consumer Event m [Person]
parsePeople = force "no people tag" $ do
    _ <- tagNoAttr "{http://example.com}success" content
    _ <- tagNoAttr "{http://example.com}row_count" content
    _ <- tagNoAttr "{http://example.com}summary" $
        tagNoAttr "{http://example.com}bananas" content
    tagNoAttr "{http://example.com}people" $ many parsePerson

-- This doesn't ignore the xmlns attributes
parsePopulation :: MonadThrow m => Consumer Event m [Person]
parsePopulation = force "population tag missing" $
    tagName "{http://example.com}population" ignoreAttrs $ \() -> parsePeople

main :: IO ()
main = do
        people <- runResourceT $
             parseFile def "people2.xml" $$ parsePopulation
        print people

这是使用游标API的示例。请注意，它具有不同的错误处理特性，但应该为格式良好的输入生成相同的结果。

{-# LANGUAGE OverloadedStrings #-}
import Text.XML
import Text.XML.Cursor
import Data.Text (Text)
import Data.Text.Read (decimal)
import Data.Monoid (mconcat)

main :: IO ()
main = do
    doc <- Text.XML.readFile def "people2.xml"
    let cursor = fromDocument doc
    print $ cursor $// element "{http://example.com}person" >=> parsePerson

data Person = Person Int Text
        deriving Show

parsePerson :: Cursor -> [Person]
parsePerson c = do
    let name = c $/ element "{http://example.com}firstname" &/ content
        ageText = c $/ element "{http://example.com}age" &/ content
    case decimal $ mconcat ageText of
        Right (age, "") -> [Person age $ mconcat name]
        _ -> []

从xml-conduit获取所有名称

1 个答案: