我想从此HTML文本(Springer期刊说明)中使用BeautifulSoup提取影响因子(0.806):
<div id="quick-facts-container" class="SideBox">
<ul class="ListStack ListStack--float">
<li>
<span>Impact Factor</span>
<span>0.806</span>
</li>
<li>
<span>Available</span>
<span>1996 - 2017</span>
</li>
<li>
<span>Volumes</span>
<span>22</span>
</li>
<li>
<span>Issues</span>
<span>265</span>
</li>
</ul>
</div>
因为它是嵌套的,我想获得第二个<span>
的内容,我不知道该怎么做。
我的python脚本很简单:
from bs4 import BeautifulSoup
import urllib.request
r =urllib.request.urlopen('file:///197.html').read()
soup = BeautifulSoup(r, 'html.parser')
答案 0 :(得分:0)
如果您只想要文档或标记的文本部分,则可以使用get_text()方法。它返回文档中或标记下的所有文本,作为单个Unicode字符串:
In [6]: for li in soup.find('div', id='quick-facts-container').find_all('li'):
...: print(li.get_text(strip=True))
...:
Impact Factor0.806
Available1996 - 2017
答案 1 :(得分:0)
以下内容应该有效:
from bs4 import BeautifulSoup
r = urllib.request.urlopen('file:///197.html').read()
soup = BeautifulSoup(r, 'html.parser')
data = [i.text for i in soup.find(id='quick-facts-container').li.find_all('span')]
print("{} ({})".format(data[0], data[1]))
将显示:
Impact Factor (0.806)