在Beautifulsoup中无法获取所需的文本

时间:2018-12-05 21:06:55

标签: python web-scraping beautifulsoup

很抱歉,如果以下格式不正确。我正在尝试只刮擦下面html的“简·多伊”部分

<div class="col1 client">
   <a name="12345"></a>
   "Jane Doe"
   <div class="request"><i>insurance claim</i></div>        
</div>

我在底部的代码将同时输出“ Jane Doe”和保险索赔。我怎样才能得到“简·多伊”的文字?预先感谢您的帮助。

soup = BeautifulSoup(page.content, 'html.parser')
listings = soup.find(id="listings")
listing_items = listings.find_all(class_="col1 client")

2 个答案:

答案 0 :(得分:2)

您要使用next_sibling

from bs4 import BeautifulSoup

html = '''
<div class="col1 client">
   <a name="12345"></a>
   "Jane Doe"
   <div class="request"><i>insurance claim</i></div>        
</div>
'''

soup = BeautifulSoup(html, 'lxml')
for item in soup.select(".col1.client a"):
    print(item.next_sibling)

print([item.next_sibling.strip() for item in soup.select(".col1.client a")])

答案 1 :(得分:0)

另一种用法可能是:

from bs4 import BeautifulSoup

htmldocs = """
<div class="col1 client">
   <a name="12345"></a>
   "Jane Doe"
   <div class="request"><i>insurance claim</i></div>        
</div>
"""
soup = BeautifulSoup(htmldocs, 'html5lib')
for item in soup.select(".request"):
    print(item.previous_sibling.strip())