我有一个VB.Net应用程序,它读取一个包含XML文件的zip文件。我需要将XML文件解析为行段,将一个节点值作为应用程序ID拉出并将其发送到MS SQL数据库。 XML文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<PROJECTS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<APPLICATION_ID>9243987</APPLICATION_ID>
<ACTIVITY>P30</ACTIVITY>
<ADMINISTERING_IC>AR</ADMINISTERING_IC>
<APPLICATION_TYPE>5</APPLICATION_TYPE>
<ARRA_FUNDED>N</ARRA_FUNDED>
<AWARD_NOTICE_DATE>05/22/2017</AWARD_NOTICE_DATE>
<BUDGET_START>04/01/2017</BUDGET_START>
</row>
<row>
<APPLICATION_ID>9243988</APPLICATION_ID>
<ACTIVITY>P30</ACTIVITY>
<ADMINISTERING_IC>AR</ADMINISTERING_IC>
<APPLICATION_TYPE>5</APPLICATION_TYPE>
<ARRA_FUNDED>N</ARRA_FUNDED>
<AWARD_NOTICE_DATE>05/22/2017</AWARD_NOTICE_DATE>
<BUDGET_START>04/01/2017</BUDGET_START>
</row>
<row>
<APPLICATION_ID>9243989</APPLICATION_ID>
<ACTIVITY>P30</ACTIVITY>
<ADMINISTERING_IC>AR</ADMINISTERING_IC>
<APPLICATION_TYPE>5</APPLICATION_TYPE>
<ARRA_FUNDED>N</ARRA_FUNDED>
<AWARD_NOTICE_DATE>05/22/2017</AWARD_NOTICE_DATE>
<BUDGET_START>04/01/2017</BUDGET_START>
</row>
</PROJECTS>
该文件可能包含一百万条记录,大小接近100毫克。我目前的代码如下,可能需要8个小时来运行一百万条记录。
我解析文件的VB代码是:
If ofdXML.ShowDialog <> Windows.Forms.DialogResult.Cancel Then
stopWatch.Start()
Dim result As String
Dim fName As String = ofdXML.FileName
If fName.EndsWith("zip") Then
Dim ePath As String = "E:\Downloads\WEEKLY"
fileName = ExtractArchive(fName, ePath)
fName = Path.Combine(ePath, fileName)
End If
result = Path.GetFileNameWithoutExtension(fName)
Dim rdr As New StreamReader(fName)
While (rdr.Peek >= 0)
varLine = rdr.ReadLine
sTag = varLine.Contains("<row>")
eTag = varLine.Contains("</row>")
If sTag And eTag Then
appLine = varLine
If appLine.Contains("<row><APPLICATION_ID>") Then
appID = appLine.Substring(Len("<row><APPLICATION_ID>"), appLine.IndexOf("/APPLICATION_ID") - Len("<row><APPLICATION_ID>") - 1)
End If
ElseIf sTag Then
v1 = True
appLine = varLine
If appLine.Contains("<row><APPLICATION_ID>") Then
appID = appLine.Substring(Len("<row><APPLICATION_ID>"), appLine.IndexOf("/APPLICATION_ID") - Len("<row><APPLICATION_ID>") - 1)
End If
ElseIf eTag Then
appLine = appLine & varLine
v1 = False
ElseIf v1 Then
appLine = appLine & varLine
If appLine.Contains("<APPLICATION_ID>") Then
Dim xi As Integer = appLine.IndexOf("_ID>") + 4
appID = appLine.Substring(xi, appLine.IndexOf("/APPLICATION_ID") - (xi + 1))
End If
End If
If Trim(Len(varLine)) > 0 And appLine.Contains("<row>") And appLine.Contains("</row") And Not varLine.Contains("</PROJECTS>") Then
TextBox2.Text = i.ToString
TextBox3.Text = appID
sb.Append(appID + ",")
Application.DoEvents()
i += 1
ADMIN_Save_To_Database(appLine, appID, result)
End If
End While
End If
非常感谢任何帮助。
答案 0 :(得分:0)
我建议您调查实际的XML解析 - 要么是你可以查询的DOM,要么是SAX你可以&#34;听&#34;至。您只对特定标记感兴趣,因此为该标记设置SAX监听器并忽略其他所有内容应该非常容易。
这应该让你开始:
https://www.tutorialspoint.com/vb.net/vb.net_xml_processing.htm
如果您坚持使用字符串解析查找优化。循环是杀手!如果你能逃脱它,你不想在循环中做昂贵的事情。
例如,您计算&#34;&lt;的长度行&gt;&lt; APPLICATION_ID&gt;&#34;每行两次(取决于格式)。这不仅昂贵,而且结果是不变的!在循环外设置或计算一次。
所有.Contains()调用都非常昂贵。你们中的许多人都是多余的。例如,您检查是否存在&#34;&lt;行&gt;&#34;和&#34;&lt; / row&gt;&#34;靠近循环顶部,然后在循环底部附近再次进行。
简而言之,您最好的选择是XML解析工具。如果您不想这样做,请仔细查看代码以获取昂贵的操作,您可以将其完全拉出循环,也可以每次只执行一次。
答案 1 :(得分:0)
我已将代码更改为:
Dim rdr As New StreamReader(fName)
Dim xml As New XmlDocument()
xml.Load(rdr)
Dim DocumentNodes As XmlNodeList =
xml.GetElementsByTagName("row")
For Each xn As XmlNode In DocumentNodes
Dim example As XmlNode =
xn.SelectSingleNode("APPLICATION_ID")
If example IsNot Nothing Then
Dim applicationID As String = example.InnerText
ADMIN_Save_AuthoringNames(xn.InnerXml, applicationID, result)
End If
Next
我会告诉你它是如何运行的