我的任务是将一个安静的Web服务的结果转换为一个带有新格式的XML文档。
要转换的html / xhtml的示例:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>OvidWS Result Set Resource</title>
</head>
<body>
<table id="results">
<tr>
<td class="_index">
<a class="uri" href="REDACTED">1</a>
</td>
<td class="au">
<span>GILLESPIE JB</span>
<span>KUKES RE</span>
</td>
<td class="so">A.M.A. American Journal of Diseases of Children</td>
<td class="ti">Acetylsalicylic acid poisoning with recovery.</td>
<td class="ui">20267726</td>
<td class="yr">1947</td>
</tr>
<tr>
<td class="_index">
<a class="uri" href="REDACTED">2</a>
</td>
<td class="au">BASS MH</td>
<td class="so">Journal of the Mount Sinai Hospital, New York</td>
<td class="ti">Aspirin poisoning in infants.</td>
<td class="ui">20265054</td>
<td class="yr">1947</td>
</tr>
</table>
</body>
</html>
理想情况下,我想要做的就是将列出的任何内容作为class属性并将其作为元素名称,如果没有“class”属性,我只想将其标记为项目。
这是我正在寻找的转换:
<results>
<citation>
<_index>
<uri href="REDACTED">1</uri>
</_index>
<au>
<item>GILLESPIE JB</item>
<item>KUKES RE</item>
</au>
<so>A.M.A. American Journal of Diseases of Children</so>
<ti>Acetylsalicylic acid poisoning with recovery.</ti>
<ui>20267726</ui>
<yr>1947</yr>
</citation>
<citation>
<_index>
<uri href="REDACTED">2</a>
</_index>
<au>BASS MH</au>
<so>Journal of the Mount Sinai Hospital, New York</so>
<ti>Aspirin poisoning in infants.</ti>
<ui>20265054</ui>
<yr>1947</yr>
</citation>
</results>
我找到了一小段代码here,它允许我重命名节点:
Public Shared Function RenameNode(ByVal e As XmlNode, newName As String) As XmlNode
Dim doc As XmlDocument = e.OwnerDocument
Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
While (e.HasChildNodes)
newNode.AppendChild(e.FirstChild)
End While
Dim ac As XmlAttributeCollection = e.Attributes
While (ac.Count > 0)
newNode.Attributes.Append(ac(0))
End While
Dim parent As XmlNode = e.ParentNode
parent.ReplaceChild(newNode, e)
Return newNode
End Function
但是在迭代XmlAttributeCollection时会出现问题。出于某种原因,当查看其中一个td节点时,2个未出现在源中的属性会神奇地出现:rowspan和colspan。看起来这些属性正在弄乱迭代器,因为当它们被消耗时,它们不会像'class'属性那样从属性列表中消失。而是消耗属性的值(从“1”变为“”)。这会导致无限循环。
我注意到它们属于'XMLUnspecifiedAttribute'类型,但是当我修改循环以检测到它时:
While (ac.Count > 0) And Not TypeOf (ac(0)) Is System.Xml.XmlUnspecifiedAttribute
newNode.Attributes.Append(ac(0))
End While
我收到以下错误:
System.Xml.XmlUnspecifiedAttribute is not accessible in this context because it is 'friend'
为什么会发生这种情况或如何解决这个问题?
答案 0 :(得分:2)
我认为您遇到的问题确实是您的文档类型声明。
因为你正在将节点完全翻译成其他东西然后我会说你甚至不需要它而且可以safely ignore it。
由于我没有将它包含在我的测试中,然后当我将其包括在内时xmlresolver变得混乱,我假设你肯定不需要它。
您可以将解析器设置为nothing
:
{xml document object}.Xmlresolver = nothing
然后您选择节点和进程。我甚至在源文件中使用了doc类型,但仍然没有问题。
以下是我用来测试的代码:
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim USEDoc As New XmlDocument
Dim theNameManager As System.Xml.XmlNamespaceManager = New System.Xml.XmlNamespaceManager(USEDoc.NameTable)
theNameManager.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml")
USEDoc.XmlResolver = Nothing
USEDoc.Load("RestServ.txt")
renameNodes(USEDoc.SelectSingleNode("descendant::xhtml:table", theNameManager))
Dim SaveDoc As New XmlDocument
SaveDoc.AppendChild(SaveDoc.ImportNode(USEDoc.SelectSingleNode("//results", theNameManager), True))
SaveDoc.Save("RestServConv.xml")
End Sub
Public Function renameNodes(ByVal TopNode As XmlNode) As Boolean
Dim UseNode As XmlNode
If TopNode.Name <> "#text" Then
If TopNode.Name = "tr" Then
UseNode = RenameNode(TopNode, "citation")
ElseIf TopNode.Name = "table" Then
UseNode = RenameNode(TopNode, "results")
UseNode.Attributes.RemoveNamedItem("id")
ElseIf TopNode.Attributes.Count > 0 Then
For Each oAttribute As XmlAttribute In TopNode.Attributes
If oAttribute.Name = "class" Then
UseNode = RenameNode(TopNode, oAttribute.Value)
UseNode.Attributes.RemoveNamedItem("class")
Exit For
End If
Next oAttribute
End If
If UseNode IsNot Nothing Then
If UseNode.ChildNodes.Count > 0 Then
Dim x As Integer
For x = 0 To UseNode.ChildNodes.Count - 1
renameNodes(UseNode.ChildNodes(x))
Next x
End If
End If
End If
Return True
End Function
Public Shared Function RenameNode(ByVal e As XmlNode, ByVal newName As String) As XmlNode
Dim doc As XmlDocument = e.OwnerDocument
Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
While (e.HasChildNodes)
newNode.AppendChild(e.FirstChild)
End While
Dim ac As XmlAttributeCollection = e.Attributes
While (ac.Count > 0)
newNode.Attributes.Append(ac(0))
End While
Dim parent As XmlNode = e.ParentNode
parent.ReplaceChild(newNode, e)
Return newNode
End Function
我传入了你的示例文档,我得到的结果是:
<results>
<citation>
<_index>
<uri href="REDACTED">1</uri>
</_index>
<au>
<span xmlns="http://www.w3.org/1999/xhtml">GILLESPIE JB</span>
<span xmlns="http://www.w3.org/1999/xhtml">KUKES RE</span>
</au>
<so rowspan="1" colspan="1">A.M.A. American Journal of Diseases of Children</so>
<ti>Acetylsalicylic acid poisoning with recovery.</ti>
<ui>20267726</ui>
<yr>1947</yr>
</citation>
<citation>
<_index>
<uri href="REDACTED">2</uri>
</_index>
<au>BASS MH</au>
<so>Journal of the Mount Sinai Hospital, New York</so>
<ti>Aspirin poisoning in infants.</ti>
<ui>20265054</ui>
<yr>1947</yr>
</citation>
</results>