Question

我想在class_="href"中抓取class_="_e4d"。基本上希望使用BeautifulSoup在类中刮取类。

from bs4 import BeautifulSoup
import selenium.webdriver as webdriver

url = ("https://www.google.com/search?...")

def get_related_search(url):
    driver = webdriver.Chrome("C:\\Users\\John\\bin\\chromedriver.exe")
    driver.get(url)
    soup = BeautifulSoup(driver.page_source)
    relate_result = soup.find_all("p", class_="_e4b")
    return relate_result[0]

relate_url = get_related_search(url)
print(relate_url)

结果：markup_type = markup_type）） p class =＆＃34; _e4b＆＃34;} {a href =＆＃34; / search？... a} {/ p}

我现在想要刮掉href结果。我不确定下一步会是什么。谢谢您的帮助。

注意：我更换了＆lt;＆gt;使用{}，因为它没有显示为html脚本

Answer 1

您实际上可以使用CSS selector一次性找到此内部a元素：

links = soup.select("p._e4b a[href]")
for link in links:
    print(link['href'])

p._e4b a[href]会在具有a类的href元素中找到p属性的所有_e4b元素。

在课堂上刮一课

1 个答案: