使用HXT库解析可能存在或不存在的元素

时间:2014-04-22 17:25:06

标签: haskell hxt

我遇到HXT问题。 我想解析一个猫头鹰文件,我的箭头有问题,因为他不想解析一棵树! 我看到问题是那个: 首先,代码:

import System.Environment  --para uso do getArgs

import Data.List.Split (splitOn)


data Class = Class {
                    name ::String,
                    subClassOf ::String
               } deriving (Show,Eq)


main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser


parseClass = ifA (hasAttr "rdf:about")  (getAttrValue "rdf:about")  (getAttrValue "rdf:ID")

parseSubClass = getAttrValue "rdf:resource"



split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l


atTag tag = deep (isElem >>> hasName tag)

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    s <- atTag "rdfs:subClassOf" -< l
    subClass <- parseSubClass -< s
    returnA -< Class { name = (split className), subClassOf = (split subClass) }

我应该能够在owl文件上解析它存在的每个节点这个例子:

<owl:Class rdf:about="Damien">
    <rdfs:subClassOf rdf:resource="PurchaseableItem"/>
</owl:Class>

但是,当我想要解析这样的树时,它根本就不会计算并扔掉它!

<owl:Class rdf:about="&camera;BodyWithNonAdjustableShutterSpeed">
    <owl:equivalentClass>
        <owl:Class>
            <owl:intersectionOf rdf:parseType="Collection">
                <rdf:Description rdf:about="&camera;Body"/>
                <owl:Restriction>
                    <owl:onProperty rdf:resource="&camera;shutter-speed"/>
                    <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">0</owl:cardinality>
                </owl:Restriction>
            </owl:intersectionOf>
        </owl:Class>
    </owl:equivalentClass>
</owl:Class>

为什么?因为Subclass节点不存在!但我希望Class可用,并将它放在我的数据上,即使子类不存在! 那么,这怎么可能呢?


我的最新版本:

import System.Environment  --para uso do getArgs
import Data.List.Split (splitOn)

data Class = Class {
                    name ::String,
                    subClassOf :: String
               } deriving (Show,Eq)

main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser

parseClass = ifA (hasAttr "rdf:about")  (getAttrValue "rdf:about")  (getAttrValue "rdf:ID")
parseSubClass = (getAttrValue "rdf:resource") `orElse` arr (const "" )

--Test  (é preciso rever esta definição) uma falha se o nome tiver o "#"
split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l

atTag tag = deep (isElem >>> hasName tag)
getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    s <- atTag "rdfs:subClassOf" -< l
    subClass <- parseSubClass -< s
    returnA -< Class { name = (split className), subClassOf = split subClass }

1 个答案:

答案 0 :(得分:1)

当SubClass节点不存在时,您需要决定所需的内容。在我看来,你有两个选择:

  • 缺少的SubClass节点意味着subClass是空字符串。在这种情况下,当atTag "rdfs:subClassOf"周围的箭头失败时,只需将解析器更改为回退到空字符串:

    getClass = atTag "owl:Class" >>>
        proc l -> do
        className <- parseClass -< l
        subClass <- getSubClass -< l
        returnA -< Class { name = split className, subClassOf = split subClass }
        where
          getSubClass =
            (atTag "rdfs:subClassOf" >>> parseSubClass) `orElse` arr (const "")
    
  • 缺少的SubClass节点意味着subClassNothing。这需要更改您的数据定义,以便subClassOf的类型为Maybe String,但之后它与之前的答案非常相似:

    getClass = atTag "owl:Class" >>>
        proc l -> do
        className <- parseClass -< l
        subClass <- getSubClass -< l
        returnA -< Class { name = split className, subClassOf = fmap split subClass }
        where
          getSubClass =
            (atTag "rdfs:subClassOf" >>> parseSubClass >>> arr Just)
            `orElse` arr (const Nothing)
    

我们很清楚,因为你说这不是在评论中工作,这正是我正在运行的完整程序,这对我来说很好:< / p>

{-# LANGUAGE Arrows #-}
import System.Environment  --para uso do getArgs
import Data.List.Split (splitOn)
import Text.XML.HXT.Core

data Class = Class {
                    name ::String,
                    subClassOf ::String
               } deriving (Show,Eq)

main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser

parseClass = ifA (hasAttr "rdf:about")
             (getAttrValue "rdf:about")
             (getAttrValue "rdf:ID")

parseSubClass = getAttrValue "rdf:resource"

split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l

atTag tag = deep (isElem >>> hasName tag)

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    subClass <- getSubClass -< l
    returnA -< Class { name = split className, subClassOf = split subClass }
    where
      getSubClass =
        (atTag "rdfs:subClassOf" >>> parseSubClass)
        `orElse` arr (const "")

请注意,如果您真的不想将多箭头步骤与>>><<<结合使用,则另一种可能性是使用内部proc

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    subClass <- (proc l' -> do
      s <- atTag "rdfs:subClassOf" -< l'
      parseSubClass -< s)
      `orElse` constA "" -< l
    returnA -< Class { name = split className, subClassOf = split subClass}