假设我有以下XML:
<time from="2017-07-29T08:00:00" to="2017-07-29T09:00:00">
<!-- Valid from 2017-07-29T08:00:00 to 2017-07-29T09:00:00 -->
<symbol number="4" numberEx="4" name="Cloudy" var="04"/>
<precipitation value="0"/>
<!-- Valid at 2017-07-29T08:00:00 -->
<windDirection deg="300.9" code="WNW" name="West-northwest"/>
<windSpeed mps="1.3" name="Light air"/>
<temperature unit="celsius" value="15"/>
<pressure unit="hPa" value="1002.4"/>
</time>
<time from="2017-07-29T09:00:00" to="2017-07-29T10:00:00">
<!-- Valid from 2017-07-29T09:00:00 to 2017-07-29T10:00:00 -->
<symbol number="4" numberEx="4" name="Partly cloudy" var="04"/>
<precipitation value="0"/>
<!-- Valid at 2017-07-29T09:00:00 -->
<windDirection deg="293.2" code="WNW" name="West-northwest"/>
<windSpeed mps="0.8" name="Light air"/>
<temperature unit="celsius" value="17"/>
<pressure unit="hPa" value="1002.6"/>
</time>
我想从中收集time from
,symbol name
和temperature value
,然后按照以下方式打印出来:time from: symbol name, temperaure value
- 就像这样:{{ 1}}。
(如您所见,此XML中有一些2017-07-29, 08:00:00: Cloudy, 15°
和name
属性。)
截至目前,我的方法很简单:
value
但我想必须有一些更好,更聪明的方法?大多数情况下,我对从XML中收集属性感兴趣,实际上我的方式对我来说似乎相当愚蠢。另外,有没有更简单的方法可以很好地打印dict #!/usr/bin/env python
# coding: utf-8
import re
from BeautifulSoup import BeautifulSoup
# data is set to the above XML
soup = BeautifulSoup(data)
# collect the tags of interest into lists. can it be done wiser?
time_l = []
symb_l = []
temp_l = []
for i in soup.findAll('time'):
i_time = str(i.get('from'))
time_l.append(i_time)
for i in soup.findAll('symbol'):
i_symb = str(i.get('name'))
symb_l.append(i_symb)
for i in soup.findAll('temperature'):
i_temp = str(i.get('value'))
temp_l.append(i_temp)
# join the forecast lists to a dict
forc_l = []
for i, j in zip(symb_l, temp_l):
forc_l.append([i, j])
rez = dict(zip(time_l, forc_l))
# combine and format the rezult. can this dict be printed simpler?
wew = ''
for key in sorted(rez):
wew += re.sub("T", ", ", key) + str(rez[key])
wew = re.sub("'", "", wew)
wew = re.sub("\[", ": ", wew)
wew = re.sub("\]", "°\n", wew)
# print the rezult
print wew
?
对任何提示或建议表示感谢。
答案 0 :(得分:4)
from bs4 import BeautifulSoup
with open("sample.xml", "r") as f: # opening xml file
content = f.read() # xml content stored in this variable
soup = BeautifulSoup(content, "lxml")
for values in soup.findAll("time"):
print("{} : {}, {}°".format(values["from"], values.find("symbol")["name"], values.find("temperature")["value"]))
输出:
2017-07-29T08:00:00 : Cloudy, 15°
2017-07-29T09:00:00 : Partly cloudy, 17°
答案 1 :(得分:2)
还可以通过导入xml.dom.minidom
模块来获取xml数据。
这是您想要的数据:
from xml.dom.minidom import parse
doc = parse("path/to/xmlfile.xml") # parse an XML file by name
itemlist = doc.getElementsByTagName('time')
for items in itemlist:
from_tag = items.getAttribute('from')
symbol_list = items.getElementsByTagName('symbol')
symbol_name = [d.getAttribute('name') for d in symbol_list ][0]
temperature_list = items.getElementsByTagName('temperature')
temp_value = [d.getAttribute('value') for d in temperature_list ][0]
print ("{} : {}, {}°". format(from_tag, symbol_name, temp_value))
输出如下:
2017-07-29T08:00:00 : Cloudy, 15°
2017-07-29T09:00:00 : Partly cloudy, 17°
希望它有用。
答案 2 :(得分:1)
在这里你也可以使用内置模块的另一种方式(我正在使用python 3.6.2):
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="left_pan">
<label for="ref_type_text">Reference <u>t</u>ype:</label>
<select name="ref_type_text" id="ref_type_text" class="dropdown" accesskey="t">
<option value="1" selected="selected">Numbered item</option>
<option value="2">Heading</option>
<option value="3">Bookmark</option>
<option value="4">Footnote</option>
</select>
</div>
<div class="right_pan">
<label for="insert_ref_text">Insert <u>r</u>eference to:</label>
<select name="ref_type_text_right" id="ref_type_text_right" class="dropdown" accesskey="t">
<option value="1" selected="selected">Page number</option>
<option value="1">Paragraph number</option>
<option value="1">Paragraph number(no content)</option>
<option value="1">Paragraph number(full content)</option>
<option value="1">Paragraph Text</option>
<option value="1">Above/Below</option>
<option value="2">Heading text</option>
<option value="2">Page number</option>
<option value="2">Heading number</option>
<option value="2">Heading number(no content)</option>
<option value="2">Heading number(full content)</option>
<option value="2">Above/Below</option>
<option value="3">Bookmark text</option>
<option value="3">Page number</option>
<option value="3">Paragraph number</option>
<option value="3">Paragraph number(no content)</option>
<option value="3">Paragraph number(full content)</option>
<option value="3">Above/Below</option>
<option value="4">Footnote number</option>
<option value="4">Page number</option>
<option value="4">Above/Below</option>
<option value="4">Footnote number(formatted)</option>
</select>
</div>