Question

我正在尝试使用HXT来解析ods（libreoffice电子表格）文件并遇到麻烦。在电子表格中，行包含许多单元格（全部名称为“单元格”），电子表格包含许多行（所有行都带有名称行）。当我尝试获取单元格的文本时，代码将它们混合在一起，最终得到一大堆不按行分隔的单元格...

尝试解析以下内容时：

<spreadsheet>
    <row>
       <cell> <p>ABC</p> </cell>
       <cell> <p>DEF</p> </cell>
       <cell> <p>GHI</p> </cell>
    </row>
    <row>
       <cell> <p>abc</p> </cell>
       <cell> <p>def</p> </cell>
       <cell> <p>ghi</p> </cell>
    </row>
    <row>
       <cell> <p>123</p> </cell>
       <cell> <p>456</p> </cell>
       <cell> <p>789</p> </cell>
    </row>
</spreadsheet>

使用代码：

import Text.XML.HXT.Core

play arg = do { results <- runX (processor arg) ; print results }
atTag x = getChildren >>> isElem >>> hasName x

processor filename =
    readDocument [withValidate no] filename >>>
    atTag "spreadsheet" >>>
    atTag "row" >>>
    atTag "cell" >>>
    atTag "p" >>>
    getChildren >>> getText

它给出[ABC，DEF，GHI，abc，def，ghi，123,456,789]而我想要的是[[ABC，DEF，GHI]，[abc，def，ghi]，[123,456] ，789]]。

我做错了什么？

Answer 1

您可以使用listA在适当的位置将结果收集到列表中：

import System.Environment (getArgs)
import Text.XML.HXT.Core

processor filename =
  readDocument [withValidate no] filename
    />  hasName "spreadsheet"
    />  hasName "row"
    >>> listA (getChildren >>> hasName "cell" /> hasName "p" /> getText)

main = fmap head getArgs >>= runX . processor >>= print

这将打印您想要的结果。

请注意，我使用的是提供的/>和hasName，而不是atTag，但如果您想坚持使用atTag，则可以轻松翻译回来。< / p>

Answer 2

它不是HXT，但您可以使用以下内容通过xml-conduit来解决它：

{-# LANGUAGE OverloadedStrings #-}
import Text.XML
import Text.XML.Cursor
import qualified Data.Text as T

main = do
    c <- fmap fromDocument $ Text.XML.readFile def "foo.xml"
    print $ c $// element "row" >=> perRow
  where
    perRow row = [row $/ element "cell" >=> perCell]
    perCell cell = [T.strip $ T.concat $ cell $// content]

Haskell HXT解析行和列并获取[[String]]而不是[String]

2 个答案: