每当我搜索具有href链接的单词时,我想要得到的是带有相应文本的“ href”。在此示例中,如果我从下面的“ div”中搜索单词“ over”,则需要它显示“ over +'href'”。
Sample of the html i used :
html '''
<div class="ez" style="" data-ft="{"tn":"*s"}">
<span><p>This is the text here</p> <a href=" my link 3 ">More</a>
<div class="bl" style="" data-ft="{"tn":"*s"}">
<span><p>Hello everybody over there</p><a href="my link 1></div><div
class="ol"...><div class="bq qr"><a> class "gh" href="my link 2"</a>
'''html
enter code here
for text_href in soup.findAll('div'):
word = text_href.text
link = text_href['href']
print(word '+' link)
for list in word:
pattern =re.compile(r'over', re.I|re.UNICODE)
matches = pattern.finditer(c)
for match in matches:
print(match) + print(link)
所以我期望的结果是标记出“ over”匹配项(在我的情况下)和“ over”匹配项所位于的link(href)。 结果: 超过+“我要获取的链接”(即href)
答案 0 :(得分:2)
我认为您正在寻找这样的东西:
for text_href in soup.findAll('div'):
word = text_href.text
if 'over' in word:
print(text_href.a['href'])
输出:
the link i want to obtain
答案 1 :(得分:1)
如果链接总是将出现在搜索文本之后,则可以使用find_next方法。
类似的东西-
html_doc ='''
<div class="ez" style="" data-ft="{"tn":"*s"}">
<span><p>This is the text over here</p> <a href="the link i want to obtain
">More</a>
<div class="bl" style="" data-ft="{"tn":"*s"}">
<span><p>Hello everybody</p> <a href="www.mylink...">More</a>
'''
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html_doc, 'html.parser')
search_string = 'over'
print(search_string, '+', soup.find(string=re.compile(search_string, re.I)).find_next('a')['href']) # over + the link i want to obtain
如果您要查找整个单词,则可以相应地更新正则表达式。