Question

这是我的代码，用于抓取并解析wordinastence.com中的必要信息，该信息为给定的单词提供了有用的上下文句子：

#first import request to crawl the html from the target page
#this case the website is http://www,wordinasentence.com

import requests

target = input("The word you want to search : ")

res = requests.get("https://wordsinasentence.com/"+ target+"-in-a-sentence/")

#further, put this in so that res_process malfunction could flag the errors
try:
    res.raise_for_status()
except Exception as e:
    print("There's a problem while connecting to a wordsinasentence sever:", e)

#it's a unreadable information, so that we needs to parse it to make it readable.
## use the beautifulsoup to make it readable

import bs4
html_soup = bs4.BeautifulSoup(res.text, 'html.parser')

#check it has been well parsed
#now we'll extract the Defintion of target

keywords = html_soup.select('Definition')

如果我运行给定的方法select（＆＃39; Defintion＆＃39;），它会一直返回空列表，即使以下打印出 html_soup 变量：

<p onclick='responsiveVoice.speak("not done for any particular reason; chosen or done at random");' style="font-weight: bold; font-family:Arial; font-size:20px; color:#504A4B;padding-bottom:0px;">Definition of Arbitrary</p>

[]

可能出现什么问题？

Answer 1

问题是你使用了错误的方法来查找文本（select()用于css选择器）。您可以使用keyword string和find_all以及一个功能来选择您要查找的标记。

def has_text_def(s):    
    return s and s.startswith('Definition of')

definitions = soup.find_all('p', string=has_text_def)

顺便说一下，您需要让next element in the tree (with next_sibling)访问定义：

for p in definitions:
    print(p.next_sibling.next_sibling.text)

BS4选择（）方法

1 个答案: