应用错误收集

Python - 大型XML - 什么是请求，解析和转换项目到dicts的最佳方式？

时间：2016-10-28 01:56:07

标签： python xml parsing dictionary request

我在这里找到了一些关于大型XML解析请求的主题，但是我无法将它们与我需要的相匹配。

我需要使用请求获取大型XML。它有<products>，我需要转换dict中的每个产品，使用数据集的table.insert（dict）发送到数据库。

XML文件：

<?xml version="1.0"?><crossDocking customerId="00000000000" company="xyz" database="08/12/2014 14:16:56" numberResults="118">
    <product>
        <prod_id>18108</prod_id>
        <brand><![CDATA[PHILIPS]]></brand>
        <prod_name><![CDATA[Fone de Ouvido SHP2500/00 para TV com Controle de Volume PHILIPS]]></prod_name>
        <seg_name><![CDATA[Eletrônicos##Fones de Ouvidos##Com Fio]]></seg_name>
        <image><![CDATA[http://static.hayamax.com.br/imgProd/18108_500_001.jpg]]></image>
        <link><![CDATA[http://www.hayamax.com.br/fone-de-ouvido-shp2500-00-para-tv-com-controle-de-volume-philips]]></link>
        <NBM><![CDATA[8518.30.00]]></NBM>
        <saleUnit><![CDATA[PC]]></saleUnit>
        <saleQuant>1</saleQuant>
        <weightValue>0.471</weightValue>
        <weightUnit><![CDATA[KG]]></weightUnit>
        <shortname><![CDATA[FONE PHILIPS SHP2500/00 6MT PTA]]></shortname>
        <EAN>8710895945875</EAN>
        <width>19.900</width>
        <height>24.000</height>
        <depth>10.900</depth>
        <information>
            <description><![CDATA[Este fone de ouvido tem um refletor acústico que melhora o reforço dinâmico de graves para seus momentos de lazer com aparelho de som ou TV.   Toda a orelha é coberta, privilegiando a qualidade de som. Proporciona conforto mesmo no uso prolongado. Possui prático cabo de 6 m, permitindo que você fique onde preferir em sua sala, e controle em linha que simplifica o ajuste de volume.]]></description>
            <characteristics><![CDATA[Tipo de Imã: Ímã em Ferrite Bobina de Voz: Cobre Resposta frequência: 15Hz a 22KHz Impedância: 32Ohms Potência: 500mW Sensibilidade: 100dB Diâmetro falante: 40mm Conector: Conectores P2 3,5 e 6,3mm estéreo cromados Cor: Prata Controle volume: Possui controle de volume no cabo]]></characteristics>
            <technical><![CDATA[Comprimento do cabo: Cabo destacável de 6 metros]]></technical>
            <included><![CDATA[]]></included>
        </information>
        <PPB>0</PPB>
        <warrantyDays>06</warrantyDays>
        <price>42.22</price>
        <stock>0</stock>
        <IPI>0.00</IPI>
        <sourceFat>PR</sourceFat>
    </product>

我应该使用ElementTree吗？

更新

import requests
import xmltodict
import xml.etree.ElementTree as etree
import lxml
from lxml import etree

url = "http://xxxxxxxxx"
response = requests.get(url, stream=True)
print response
xml = etree.parse(response.content)

for product in xml:
    print product ## <Product>

输出：

<Response [200]>
Traceback (most recent call last):
  File "/home/ubuntu/workspace/ex50/bin/hayamax/hayamax.py", line 10, in <module>
    xml = etree.parse(response.content)
  File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src/lxml/lxml.etree.c:79841)
  File "src/lxml/parser.pxi", line 1793, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:116175)
  File "src/lxml/parser.pxi", line 1819, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:116525)
  File "src/lxml/parser.pxi", line 1723, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:115413)
  File "src/lxml/parser.pxi", line 1126, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:110110)
  File "src/lxml/parser.pxi", line 584, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:103584)
  File "src/lxml/parser.pxi", line 694, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:105238)
  File "src/lxml/parser.pxi", line 622, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:104104)
IOError

1 个答案:

答案 0 :(得分：0)

我不确定我是否完全理解你，但试试这个。这就是我读这类数据的方式：

import urllib
response = urllib.urlopen(url)
data= response.read()
tree = etree.fromstring(data)
xml=tree.findall('products/product')

这假定它们是一个很长的列表，可以从中提取<product> many xml nested things </product>嵌套在<products> many products</products>

中

我认为它会做你想要的。然后你可以循环通过内部部件以同样的方式做你想做的事。