我想解决我目前维护的rdf4h库中的错误。它支持将XML / RDF文档解析为XmlParser模块中的RDF图,但不能成功解析包含XML规范头的XML / RDF文档,例如。
<?xml version="1.0" encoding="ISO-8859-1"?>
解析器使用HXT箭头界面,即Text.XML.HXT.Core
module。我已经将问题归结为函数testSuccess
和testFailure
中的两次解析尝试。两者都使用runSLA。 hxt的作者告诉我,问题在于xread
的使用,我首先要在xread
之前从字符串中提取XML文档。 (不幸的是,他没有对我提出的GitHub issue做出回应。)
下面有两个字符串,两个字符串都包含相同的XML文档。 xmlDoc1
字符串包含一个规范标题,该标题会抬起xread
中的testFailure
箭头。
module HXTProblem where
import Text.XML.HXT.Core
data GParseState = GParseState { stateGenId :: Int } deriving(Show)
-- this document has an XML specification included
xmlDoc1 :: String
xmlDoc1 = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>" ++
"<shiporder orderid=\"889923\" " ++
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " ++
"xsi:noNamespaceSchemaLocation=\"shiporder.xsd\">" ++
"<orderperson>John Smith</orderperson>" ++
"<shipto>" ++
"<name>Ola Nordmann</name>" ++
"</shipto>" ++
"</shiporder>"
-- this document does not include the XML specification
xmlDoc2 :: String
xmlDoc2 = "<shiporder orderid=\"889923\" " ++
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " ++
"xsi:noNamespaceSchemaLocation=\"shiporder.xsd\">" ++
"<orderperson>John Smith</orderperson>" ++
"<shipto>" ++
"<name>Ola Nordmann</name>" ++
"</shipto>" ++
"</shiporder>"
initState :: GParseState
initState = GParseState { stateGenId = 0 }
-- | Works
testSuccess :: (GParseState,[XmlTree])
testSuccess = runSLA xread initState xmlDoc2
{- output of runnnig testSuccess
(GParseState {stateGenId = 0},[NTree (XTag "shiporder" [NTree (XAttr "orderid") [NTree (XText "889923") []],NTree (XAttr "xmlns:xsi") [NTree (XText "http://www.w3.org/2001/XMLSchema-instance") []],NTree (XAttr "xsi:noNamespaceSchemaLocation") [NTree (XText "shiporder.xsd") []]]) [NTree (XTag "orderperson" []) [NTree (XText "John Smith") []],NTree (XTag "shipto" []) [NTree (XTag "name" []) [NTree (XText "Ola Nordmann") []]]]]
-}
-- | Does not work
testFailure :: (GParseState,[XmlTree])
testFailure = runSLA xread initState xmlDoc1
{- ERROR running testFailure
(GParseState {stateGenId = 0},[NTree (XError 2 "\"string: \"<?xml version=\\\"1.0\\\" encoding=\\\"ISO-8859-1...\"\" (line 1, column 6):\nunexpected xml\nexpecting legal XML name character\n") []])
-}
我应该补充一点,我正在寻找使用runSLA
的解决方案,该解决方案在解析XMLTree
或xmlDoc1
时会生成相同的xmlDoc2
。
答案 0 :(得分:1)
华友世纪,这已经解决了。 HXT库的作者已经解决GitHub issue在this commit中添加了新的解析器xreadDoc
。我已经使用this commit中的这个新解析器修复了rdf4h库版本1.2.2及更高版本,因此现在可以使用XmlParser
解析XML / RDF文档(带有规范和编码标题)。 / p>
请注意testFailure
中的新箭头组成,为(xreadDoc >>> isElem)
。
module HXTProblem where
import Text.XML.HXT.Core
data GParseState = GParseState { stateGenId :: Int } deriving(Show)
-- this document has an XML specification included
xmlDoc1 :: String
xmlDoc1 = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>" ++
"<shiporder orderid=\"889923\" " ++
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " ++
"xsi:noNamespaceSchemaLocation=\"shiporder.xsd\">" ++
"<orderperson>John Smith</orderperson>" ++
"<shipto>" ++
"<name>Ola Nordmann</name>" ++
"</shipto>" ++
"</shiporder>"
-- this document does not include the XML specification
xmlDoc2 :: String
xmlDoc2 = "<shiporder orderid=\"889923\" " ++
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " ++
"xsi:noNamespaceSchemaLocation=\"shiporder.xsd\">" ++
"<orderperson>John Smith</orderperson>" ++
"<shipto>" ++
"<name>Ola Nordmann</name>" ++
"</shipto>" ++
"</shiporder>"
initState :: GParseState
initState = GParseState { stateGenId = 0 }
-- | Works
testSuccess :: (GParseState,[XmlTree])
testSuccess = runSLA xread initState xmlDoc2
-- | Does also now work!
testFailure :: (GParseState,[XmlTree])
testFailure = runSLA (xreadDoc >>> isElem) initState xmlDoc1
testEquality :: Bool
testEquality =
let (_,x) = testSuccess
(_,y) = testFailure
in x == y