我有类型为bs4.element.Tag的元素:
<div class="table_v_nr">
1003 : 11400
<span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>
我需要从该元素中获取“ 1003:11400”。拜托,该怎么做?
谢谢
编辑:
以及如果我有多个div,如何选择单个元素(“ 1003:11400”,...):
<div class="table_v_nr">
1003 : 11400
<span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>,
<div class="table_v_nr">
1003 : 11400
<span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 36id</span></div>,
<div class="table_v_nr">
1007 : 11550
<span class="table_v_time" title="13. min. 2. hr. 6. day.">Y 37id</span></div>,
...
答案 0 :(得分:1)
这应该可以帮助您:
div = soup.find('div', class_ = "table_v_nr")
print(div.find_next(text=True).strip())
完整代码:
from bs4 import BeautifulSoup
html = '''
<div class="table_v_nr">
1003 : 11400
<span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>
'''
soup = BeautifulSoup(html,'html5lib')
div = soup.find('div', class_ = "table_v_nr")
print(div.find_next(text=True).strip())
输出:
1003 : 11400
编辑:
如果您想从多个div
标签中提取文本,则可以尝试如下操作:
from bs4 import BeautifulSoup
html = """
<div class="table_v_nr">
1003 : 11400
<span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>,
<div class="table_v_nr">
1003 : 11400
<span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 36id</span></div>,
<div class="table_v_nr">
1007 : 11550
<span class="table_v_time" title="13. min. 2. hr. 6. day.">Y 37id</span></div>,
"""
soup = BeautifulSoup(html,'html5lib')
[print(div.find_next(text=True).strip()) for div in soup.find_all('div', class_ = "table_v_nr")]
输出:
1003 : 11400
1003 : 11400
1007 : 11550
答案 1 :(得分:1)
使用.contents
:
from bs4 import BeautifulSoup
html = """<div class="table_v_nr">
1003 : 11400
<span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>
"""
soup = BeautifulSoup(html,'html.parser')
div = soup.find('div', class_ = "table_v_nr").contents[0]
print(div.strip())
输出:
1003 : 11400
编辑,您可以使用CSS选择器:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,'html.parser')
for tag in soup.select (".table_v_nr:contains('1003')"):
print(tag.next.strip())
输出:
1003 : 11400
1003 : 11400