我有一个大小为35 GB的XML文件。我尝试使用SQL Server xquery和python lxml解析此文件。 SQL在xml数据类型上有2 GB的限制,而python会引发内存错误。
因此,我决定使用Vb.Net通过代码来解析文件。
我使用DataSet读取xml文件,以避免复杂的xpath查询。但是,这也抛出了内存不足错误。
Try
Dim xmlFile As XmlReader
xmlFile = XmlReader.Create("D:\wcproduction.xml", New XmlReaderSettings())
Dim ds As New DataSet
ds.ReadXml(xmlFile)
Dim i As Integer
For i = 0 To ds.Tables(0).Rows.Count - 1
MsgBox(ds.Tables(0).Rows(i).Item(0).ToString)
Next
Catch ex As Exception
MsgBox(ex.Message)
End Try
这是来自实际文件的示例xml数据。
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsd:schema targetNamespace="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd"/>
<xsd:element name="wcproduction">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="api_st_cde" type="sqltypes:smallint" nillable="1"/>
<xsd:element name="api_cnty_cde" type="sqltypes:smallint" nillable="1"/>
<xsd:element name="api_well_idn" type="sqltypes:int" nillable="1"/>
<xsd:element name="pool_idn" type="sqltypes:int" nillable="1"/>
<xsd:element name="prodn_mth" type="sqltypes:smallint" nillable="1"/>
<xsd:element name="prodn_yr" type="sqltypes:int" nillable="1"/>
<xsd:element name="ogrid_cde" type="sqltypes:int" nillable="1"/>
<xsd:element name="prd_knd_cde" nillable="1">
<xsd:simpleType>
<xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="2"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="eff_dte" type="sqltypes:datetime" nillable="1"/>
<xsd:element name="amend_ind" nillable="1">
<xsd:simpleType>
<xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="c115_wc_stat_cde" nillable="1">
<xsd:simpleType>
<xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="prod_amt" type="sqltypes:int" nillable="1"/>
<xsd:element name="prodn_day_num" type="sqltypes:smallint" nillable="1"/>
<xsd:element name="mod_dte" type="sqltypes:datetime" nillable="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<wcproduction xmlns="urn:schemas-microsoft-com:sql:SqlRowSet1">
<api_st_cde>30</api_st_cde>
<api_cnty_cde>5</api_cnty_cde>
<api_well_idn>20178</api_well_idn>
<pool_idn>10540</pool_idn>
<prodn_mth>7</prodn_mth>
<prodn_yr>1973</prodn_yr>
<ogrid_cde>12437</ogrid_cde>
<prd_knd_cde>G </prd_knd_cde>
<eff_dte>1973-07-31T00:00:00</eff_dte>
<amend_ind>N</amend_ind>
<c115_wc_stat_cde>F</c115_wc_stat_cde>
<prod_amt>53612</prod_amt>
<prodn_day_num>99</prodn_day_num>
<mod_dte>2015-04-07T07:31:00.173</mod_dte>
</wcproduction>
</root>
我需要一种可以从大小为35 GB或更大的XML文件中读取数据并将数据传输到SQL Server数据库的解决方案。
答案: 由于数据集对象使用内存,因此它将成为瓶颈。因此,请尝试此解决方案 Reading large XML file using XMLReader in VB.net