从bs4.element

时间:2020-11-01 12:36:12

标签: python web-scraping beautifulsoup

我有类型为bs4.element.Tag的元素:

<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>

我需要从该元素中获取“ 1003:11400”。拜托,该怎么做?

谢谢

编辑:

以及如果我有多个div,如何选择单个元素(“ 1003:11400”,...):

    <div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>,
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 36id</span></div>,
<div class="table_v_nr">
    1007 : 11550

    <span class="table_v_time" title="13. min. 2. hr. 6. day.">Y 37id</span></div>,

...

2 个答案:

答案 0 :(得分:1)

这应该可以帮助您:

div = soup.find('div', class_ = "table_v_nr")
print(div.find_next(text=True).strip())

完整代码:

from bs4 import BeautifulSoup

html = '''
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>
'''
soup = BeautifulSoup(html,'html5lib')

div = soup.find('div', class_ = "table_v_nr")
print(div.find_next(text=True).strip())

输出:

1003 : 11400

编辑:

如果您想从多个div标签中提取文本,则可以尝试如下操作:

from bs4 import BeautifulSoup

html = """
    <div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>,
<div class="table_v_nr">
    1003 : 11400

    <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 36id</span></div>,
<div class="table_v_nr">
    1007 : 11550

    <span class="table_v_time" title="13. min. 2. hr. 6. day.">Y 37id</span></div>,
"""
soup = BeautifulSoup(html,'html5lib')

[print(div.find_next(text=True).strip()) for div in soup.find_all('div', class_ = "table_v_nr")]

输出:

1003 : 11400
1003 : 11400
1007 : 11550

答案 1 :(得分:1)

使用.contents

from bs4 import BeautifulSoup

html = """<div class="table_v_nr">
    1003 : 11400

   <span class="table_v_time" title="12. min. 2. hr. 6. day.">Y 35id</span></div>
"""
soup = BeautifulSoup(html,'html.parser')

div = soup.find('div', class_ = "table_v_nr").contents[0]
print(div.strip())

输出:

1003 : 11400

编辑,您可以使用CSS选择器:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'html.parser')

for tag in soup.select (".table_v_nr:contains('1003')"):
    print(tag.next.strip())

输出:

1003 : 11400
1003 : 11400