<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
<S I="50" N="Football">
<C I="65" N="Russia">
<L I="167" N="Premier League">
<E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
<M K="1x2">
<B I="81" BTDT="2015-03-23T23:04:00,825">
<O N="1" V="3"/>
<O N="X" V="3.1"/>
<O N="2" V="2.25"/>
</B>
</M>
</E>
</L>
</C>
</S>
</markets>
我正在尝试使用Python中的etree解析此XML。我以前做过XML解析,但文档一直都是格式化的。
<tag> value </tag>
我不确定如何将“D”与“市场”以及所有其他值隔离开来。
这是我打开和解析XML Doc的方式:
z = gzip.open("code2.zip", "r")
tree = etree.parse(z)
print(etree.tostring(tree, pretty_print=True))
我试过了:
for markets in tree.findall('markets'):
print "found"
然而,这不起作用。我会很感激一些提示/建议。希望一旦我获得第一个“D”提取,我将能够得到其余的。
答案 0 :(得分:2)
处理具有默认命名空间的XML时,这是一个常见错误。您的XML具有默认名称空间,即没有前缀声明的名称空间,此处为:
的xmlns = “http://www.eoddsmaker.net/schemas/markets/1.0”
因此,在您的情况下,在该命名空间中隐式考虑所有元素。使用xpath()
查询命名空间中元素的一种可能方法:
.......
#creating prefix-to-namespace_uri mapping
ns = {'d' : 'http://www.eoddsmaker.net/schemas/markets/1.0'}
#use registered prefix along with the element name to query, and pass the mapping as 2nd argument
markets = tree.xpath('//d:markets', namespaces=ns)[0]
#get and print value of D attribute from <markets> :
print markets.get('D')
答案 1 :(得分:0)
我在没有etree知识的情况下回答这个问题。我只是打开以下页面: https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml
您正在寻找的是属性,并展示了如何非常清楚地推导出它们:
tree = etree.parse(z)
root = tree.getroot()
print root.attrib
<markets>
元素的所有属性,如D和CNT。
你应该能够自己弄清楚其余部分。你只需循环遍历每个元素的子元素并从每个元素中获取.attrib
。
考虑到我很容易找到这个答案,请在发布问题之前再做一些研究:)
P.S。这个答案是为Python 2.7编写的。对于Python 3,它将是
print(tree.attrib)
答案 2 :(得分:0)
使用xml.etree
import xml.etree.ElementTree as ET
root = ET.fromstring("""<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
<S I="50" N="Football">
<C I="65" N="Russia">
<L I="167" N="Premier League">
<E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
<M K="1x2">
<B I="81" BTDT="2015-03-23T23:04:00,825">
<O N="1" V="3"/>
<O N="X" V="3.1"/>
<O N="2" V="2.25"/>
</B>
</M>
</E>
</L>
</C>
</S>
</markets>""")
>>>print root.attrib
{'CNT': '1521', 'D': '2015-03-23T23:12:34'}
>>>print root[0].attrib
{'I': '50', 'N': 'Football'}
#and so on to next parse next line
如果需要从xml
文件进行解析。
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
有关详情,请参阅https://docs.python.org/2/library/xml.etree.elementtree.html
答案 3 :(得分:0)
print markets.get('D');
打印&#39; D&#39;市场(根)
for element in tree.iterfind(".//{*}<Tag to search for>"):
print element.get("<Attribute to look for>");
将迭代当前节点封装的XML文件中的元素,并在iterfind()中打印每个元素的指定属性。
例如:
for element in tree.iterfind(".//{*}O"):
print element.get("N");
将打印
1
X
2
另请注意,如果XML文档中有多个名称空间,则必须在传递给iterfind()的字符串中的花括号中指定以匹配要在其下搜索的名称空间。
for element in tree.iterfind(".//{http://www.eoddsmaker.net/schemas/markets/1.0}<Tag to search for>"):