我在这里找到了一些关于大型XML解析请求的主题,但是我无法将它们与我需要的相匹配。
我需要使用请求获取大型XML。它有<products>
,我需要转换dict中的每个产品,使用数据集的table.insert(dict)发送到数据库。
<?xml version="1.0"?><crossDocking customerId="00000000000" company="xyz" database="08/12/2014 14:16:56" numberResults="118">
<product>
<prod_id>18108</prod_id>
<brand><![CDATA[PHILIPS]]></brand>
<prod_name><![CDATA[Fone de Ouvido SHP2500/00 para TV com Controle de Volume PHILIPS]]></prod_name>
<seg_name><![CDATA[Eletrônicos##Fones de Ouvidos##Com Fio]]></seg_name>
<image><![CDATA[http://static.hayamax.com.br/imgProd/18108_500_001.jpg]]></image>
<link><![CDATA[http://www.hayamax.com.br/fone-de-ouvido-shp2500-00-para-tv-com-controle-de-volume-philips]]></link>
<NBM><![CDATA[8518.30.00]]></NBM>
<saleUnit><![CDATA[PC]]></saleUnit>
<saleQuant>1</saleQuant>
<weightValue>0.471</weightValue>
<weightUnit><![CDATA[KG]]></weightUnit>
<shortname><![CDATA[FONE PHILIPS SHP2500/00 6MT PTA]]></shortname>
<EAN>8710895945875</EAN>
<width>19.900</width>
<height>24.000</height>
<depth>10.900</depth>
<information>
<description><![CDATA[Este fone de ouvido tem um refletor acústico que melhora o reforço dinâmico de graves para seus momentos de lazer com aparelho de som ou TV. Toda a orelha é coberta, privilegiando a qualidade de som. Proporciona conforto mesmo no uso prolongado. Possui prático cabo de 6 m, permitindo que você fique onde preferir em sua sala, e controle em linha que simplifica o ajuste de volume.]]></description>
<characteristics><![CDATA[Tipo de Imã: Ímã em Ferrite Bobina de Voz: Cobre Resposta frequência: 15Hz a 22KHz Impedância: 32Ohms Potência: 500mW Sensibilidade: 100dB Diâmetro falante: 40mm Conector: Conectores P2 3,5 e 6,3mm estéreo cromados Cor: Prata Controle volume: Possui controle de volume no cabo]]></characteristics>
<technical><![CDATA[Comprimento do cabo: Cabo destacável de 6 metros]]></technical>
<included><![CDATA[]]></included>
</information>
<PPB>0</PPB>
<warrantyDays>06</warrantyDays>
<price>42.22</price>
<stock>0</stock>
<IPI>0.00</IPI>
<sourceFat>PR</sourceFat>
</product>
我应该使用ElementTree吗?
import requests
import xmltodict
import xml.etree.ElementTree as etree
import lxml
from lxml import etree
url = "http://xxxxxxxxx"
response = requests.get(url, stream=True)
print response
xml = etree.parse(response.content)
for product in xml:
print product ## <Product>
<Response [200]>
Traceback (most recent call last):
File "/home/ubuntu/workspace/ex50/bin/hayamax/hayamax.py", line 10, in <module>
xml = etree.parse(response.content)
File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src/lxml/lxml.etree.c:79841)
File "src/lxml/parser.pxi", line 1793, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:116175)
File "src/lxml/parser.pxi", line 1819, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:116525)
File "src/lxml/parser.pxi", line 1723, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:115413)
File "src/lxml/parser.pxi", line 1126, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:110110)
File "src/lxml/parser.pxi", line 584, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:103584)
File "src/lxml/parser.pxi", line 694, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:105238)
File "src/lxml/parser.pxi", line 622, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:104104)
IOError
答案 0 :(得分:0)
我不确定我是否完全理解你,但试试这个。这就是我读这类数据的方式:
import urllib
response = urllib.urlopen(url)
data= response.read()
tree = etree.fromstring(data)
xml=tree.findall('products/product')
这假定它们是一个很长的列表,可以从中提取<product> many xml nested things </product>
嵌套在<products> many products</products>
我认为它会做你想要的。然后你可以循环通过内部部件以同样的方式做你想做的事。