因此,我是全新的整个网络抓取工具。我一直在从事一个项目,该项目需要我从here获得最新消息。我现在已经成功地抓住了这个词,我只需要获取定义,但是当我这样做时,我得到了以下结果:
Avuncular(每日正确的单词)
定义:
[]
这是我的代码:
from lxml import html
import requests
page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = html.fromstring(page.content)
word = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[1]/div[2]/div[1]/div/h1/text()')
WOTD = str(word)
WOTD = WOTD[2:]
WOTD = WOTD[:-2]
print(WOTD.capitalize())
print("Definition:")
wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[2]/div[1]/div/div[1]/p[1]/text()')
print(wordDef)
[]应该是第一个定义,但由于某些原因而无法使用。
任何帮助将不胜感激。
答案 0 :(得分:1)
您的xpath稍微关闭了。这是正确的:
ID, Budget, Spend, Week, Status
1, 50, 50, 1, base
1 50, 55, 2, over
1 50, 50, 3, base
1 50, 250, 4, over
1 50, 300, 5, over
1 50, 42, 6, under.
在主体/文章之后注意div [3],而不是div [2]。现在,在运行时,您应该获得:
wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[3]/div[1]/div/div[1]/p[1]/text()')
答案 1 :(得分:1)
如果要避免在xpath中对索引进行硬编码,则可以使用以下方法替代当前尝试:
import requests
from lxml.html import fromstring
page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = fromstring(page.text)
word = tree.xpath("//*[@class='word-header']//h1")[0].text
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p/strong")[0].tail.strip()
print(f'{word}\n{wordDef}')
如果wordDef
无法获得全部部分,请尝试用以下内容替换:
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p")[0].text_content()
输出:
avuncular
suggestive of an uncle especially in kindliness or geniality