解析大小为35 GB的大型XML文件,以将数据传输到SQL Server,而不会出现内存不足错误

时间:2019-02-01 02:43:55

标签: sql-server vb.net xml-parsing

我有一个大小为35 GB的XML文件。我尝试使用SQL Server xquery和python lxml解析此文件。 SQL在xml数据类型上有2 GB的限制,而python会引发内存错误。

因此,我决定使用Vb.Net通过代码来解析文件。

我使用DataSet读取xml文件,以避免复杂的xpath查询。但是,这也抛出了内存不足错误。

Try

        Dim xmlFile As XmlReader
        xmlFile = XmlReader.Create("D:\wcproduction.xml", New XmlReaderSettings())
        Dim ds As New DataSet
        ds.ReadXml(xmlFile)
        Dim i As Integer
        For i = 0 To ds.Tables(0).Rows.Count - 1
            MsgBox(ds.Tables(0).Rows(i).Item(0).ToString)
        Next
    Catch ex As Exception
        MsgBox(ex.Message)
    End Try

这是来自实际文件的示例xml数据。

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsd:schema targetNamespace="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
    <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd"/>
    <xsd:element name="wcproduction">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="api_st_cde" type="sqltypes:smallint" nillable="1"/>
                <xsd:element name="api_cnty_cde" type="sqltypes:smallint" nillable="1"/>
                <xsd:element name="api_well_idn" type="sqltypes:int" nillable="1"/>
                <xsd:element name="pool_idn" type="sqltypes:int" nillable="1"/>
                <xsd:element name="prodn_mth" type="sqltypes:smallint" nillable="1"/>
                <xsd:element name="prodn_yr" type="sqltypes:int" nillable="1"/>
                <xsd:element name="ogrid_cde" type="sqltypes:int" nillable="1"/>
                <xsd:element name="prd_knd_cde" nillable="1">
                    <xsd:simpleType>
                        <xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
                            <xsd:maxLength value="2"/>
                        </xsd:restriction>
                    </xsd:simpleType>
                </xsd:element>
                <xsd:element name="eff_dte" type="sqltypes:datetime" nillable="1"/>
                <xsd:element name="amend_ind" nillable="1">
                    <xsd:simpleType>
                        <xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
                            <xsd:maxLength value="1"/>
                        </xsd:restriction>
                    </xsd:simpleType>
                </xsd:element>
                <xsd:element name="c115_wc_stat_cde" nillable="1">
                    <xsd:simpleType>
                        <xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
                            <xsd:maxLength value="1"/>
                        </xsd:restriction>
                    </xsd:simpleType>
                </xsd:element>
                <xsd:element name="prod_amt" type="sqltypes:int" nillable="1"/>
                <xsd:element name="prodn_day_num" type="sqltypes:smallint" nillable="1"/>
                <xsd:element name="mod_dte" type="sqltypes:datetime" nillable="1"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>
<wcproduction xmlns="urn:schemas-microsoft-com:sql:SqlRowSet1">
    <api_st_cde>30</api_st_cde>
    <api_cnty_cde>5</api_cnty_cde>
    <api_well_idn>20178</api_well_idn>
    <pool_idn>10540</pool_idn>
    <prodn_mth>7</prodn_mth>
    <prodn_yr>1973</prodn_yr>
    <ogrid_cde>12437</ogrid_cde>
    <prd_knd_cde>G </prd_knd_cde>
    <eff_dte>1973-07-31T00:00:00</eff_dte>
    <amend_ind>N</amend_ind>
    <c115_wc_stat_cde>F</c115_wc_stat_cde>
    <prod_amt>53612</prod_amt>
    <prodn_day_num>99</prodn_day_num>
    <mod_dte>2015-04-07T07:31:00.173</mod_dte>
</wcproduction>
</root>

我需要一种可以从大小为35 GB或更大的XML文件中读取数据并将数据传输到SQL Server数据库的解决方案。

答案: 由于数据集对象使用内存,因此它将成为瓶颈。因此,请尝试此解决方案 Reading large XML file using XMLReader in VB.net

0 个答案:

没有答案