Question

在python3中，我使用此脚本来刮取Google搜索的第一个屏幕：

["Bob's House", "Cabin"]

程序显示或捕获href链接以及该链接的描述性文本，即页面名称。但我也想提取Google搜索链接下方的短语

例如，在此页面（https://www.google.com/search?client=ubuntu&channel=fs&ei=DrSNW8r3E4urwgS977WYDA&q=ALDEANNO+CAMPOS+deputado+federal+ditadura&oq=ALDEANNO+CAMPOS+deputado+federal+ditadura&gs_l=psy-ab.12...0.0.0.1933260.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.U9iFnwXwzpk）上的文本：

“ 2018年8月24日-联邦代表Dedeadoado货运公司完成了Candidato a Candidato货运工作，PRP nasEleições2018 noPará完成了销售。”

“ Relacionamos a seguir os senadores e deputados federais brasileiros cassados符合为...。坎皮·洛斯·哥斯达黎加·累西腓，PE，PTB-PE（1962）...”

“弗朗西斯科·路易斯·达席尔瓦·坎普斯（Dores doIndaiá，1891年11月11日－贝洛奥里藏特，... Em 1921埃斯塔多·诺沃（Estado Novo），1937年在埃斯塔多·德埃斯塔多（Estado decretado em novembro）成立。”

依此类推

请，有人知道我该如何捕捉链接下方的最终文本？

名称为“ CORONEL FERES”的显示方式示例-打印（链接）-（无法显示html代码）

PSL Itapema-帖子| Facebookhttps：//www.facebook.com/PSLitapema17/posts/1638801189535968穆拉·阿波亚或普雷卡迪纳托将军（Deputado Federal Coronel Feres）。 Confira：37个意见.... Humaa ditadura silenciosa quenãopodemos许可。宝迪亚！

Answer 1

您只需将其添加到循环中，请参见下面的代码。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select

nome = '"ALDEANNO CAMPOS"'
nome = nome.replace(' ', '+')
cargo = 'DEPUTADO FEDERAL'

busca = f'https://www.google.com.br/search?q={nome}+{cargo}+ditadura'

profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)

browser.get(busca)

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()

page = soup.find_all("div", {"class": "rc"})

for link in page:
    href = link.find("a")['href']
    texto = link.find("a").text
    body = link.find('span', attrs={'class': 'st'}).text
    print(href)
    print(texto)
    print(body)
    print("---------------")

使用Python，如何从Google搜索中抓取链接的描述性文字？

1 个答案: