使用python bs4和lxml从xml中提取值

时间:2017-10-09 10:20:29

标签: python beautifulsoup lxml

如何从下面的xml文件中提取听众 <listeners>10</listeners> 的数量,我的代码无效。

import bs4
import urllib2
import lxml
 bs4.BeautifulSoup(urllib2.urlopen('http://admin:mashytamam@192.168.0.31:8382/admin/').read(), 'lxml')    
SERVER = 'http://192.168.0.31:8382/admin/'
authinfo = urllib2.HTTPPasswordMgrWithDefaultRealm()
authinfo.add_password(None, SERVER, 'admin', 'mypassword')
page = 'http://192.168.0.31:8382/admin/'
handler = urllib2.HTTPBasicAuthHandler(authinfo)
myopener = urllib2.build_opener(handler)
opened = urllib2.install_opener(myopener)
output = urllib2.urlopen(page)
print output.read()
soup = bs4.BeautifulSoup(output.read(), 'lxml')
print soup.find('listeners')

,xml如下

<icestats>
<admin>icemaster@localhost</admin>
<banned_IPs>0</banned_IPs>
<build>20140902200316</build>
<client_connections>289</client_connections>
<clients>2</clients>
<connections>291</connections>
<file_connections>13</file_connections>
<host>localhost</host>
<listener_connections>0</listener_connections>
<listeners>10</listeners>
<location>Earth</location>
<outgoing_kbitrate>0</outgoing_kbitrate>
<server_id>Icecast 2.3.3-kh11</server_id>
<server_start>08/Oct/2017:08:43:08 +1100</server_start>
<source_client_connections>1</source_client_connections>
<source_relay_connections>0</source_relay_connections>
<source_total_connections>1</source_total_connections>
<sources>1</sources>
<stats>0</stats>
<stats_connections>0</stats_connections>
<stream_kbytes_read>185119</stream_kbytes_read>
<stream_kbytes_sent>0</stream_kbytes_sent>
<source mount="/listen.mp3">
<audio_codecid>2</audio_codecid>
<audio_info>bitrate=60</audio_info>
<bitrate>60</bitrate>
<connected>42056</connected>
<genre>Islam</genre>
<incoming_bitrate>35976</incoming_bitrate>
<listener_connections>0</listener_connections>
<listener_peak>0</listener_peak>
<listeners>0</listeners>
<listenurl>http://localhost:8382/listen.mp3</listenurl>
<max_listeners>unlimited</max_listeners>
<mpeg_channels>2</mpeg_channels>
<mpeg_samplerate>22050</mpeg_samplerate>
<outgoing_kbitrate>0</outgoing_kbitrate>
<public>1</public>
<queue_size>64523</queue_size>
<server_description>Quran Kareem Radio</server_description>
<server_name>Quran Kareem Radio</server_name>
<server_type>audio/mpeg</server_type>
<server_url>http://qkradio.com.au</server_url>
<slow_listeners>0</slow_listeners>
<source_ip>139.218.241.112</source_ip>
<stream_start>08/Oct/2017:08:43:16 +1100</stream_start>
<total_bytes_read>189563392</total_bytes_read>
<total_bytes_sent>0</total_bytes_sent>
<total_mbytes_sent>0</total_mbytes_sent>
<user_agent>instreamer</user_agent>
</source>
</icestats>

2 个答案:

答案 0 :(得分:1)

试试这个:

soup = BeautifulSoup(output.read(), 'xml')
for value in soup.find_all('listeners'):
    print(value.get_text())

答案 1 :(得分:0)

使用它:

soup = BeautifulSoup(output.read())
soup.select('listeners')
[<listeners>10</listeners>, <listeners>0</listeners>]