Question

我想从Google提供的AMD股票中抓取信息。我已经能够抓取整个网页，但是当我尝试获取特定的 div 或 class 时，我什么也找不到，控制台返回[]。在抓取整个页面时，我也找不到那些类，在搜索之后，我发现它可能被Javascript隐藏了，可以以某种方式使用Selenium吗？我尝试使用Selenium Webdriver，但这无济于事。

这是我到目前为止所拥有的：

import requests
from bs4 import BeautifulSoup
import urllib3
from selenium import webdriver

requests.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"}


url = "https://www.google.com/search?q=amd+stock&oq=amd+stock&aqs=chrome..69i57j35i39j0l5j69i60.1017j0j7&sourceid=chrome&ie=UTF-8"
source_code = requests.get(url, requests.headers)
soup = BeautifulSoup(source_code.text, "html.parser")
amd = soup.find_all('div', attrs = {'class': 'aviV4d'})
print(amd)

当打印“汤”时，我会得到整页，但是当打印“ amd”时，我会得到[]。

Answer 1

我相信您需要添加amd.response或amd.text

print(amd.response)
print(amd.text)

Answer 2

这是一个动态页面，它不会仅通过requests请求页面来源来提供股票价格。您将必须使用抓取来做到这一点。尝试以下方法：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--incognito")
chromedriver_path = './chromedriver'
driver = webdriver.Chrome(executable_path=chromedriver_path, options=options)
driver.get("https://www.google.com/search?q=amd+stock&oq=amd+stock&aqs=chrome..69i57j35i39j0l5j69i60.1017j0j7&sourceid=chrome&ie=UTF-8")

time.sleep(2)
x = driver.find_element_by_xpath('//*[@id="knowledge-finance-wholepage__entity-summary"]/div/g-card-section/div/g-card-section/span[1]/span/span[1]')

print(x.text)
driver.quit()

输出：

48.16

Answer 3

您的代码还可以，但是请在headers=调用中使用request()参数：

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"}

url = "https://www.google.com/search?q=amd+stock&oq=amd+stock&aqs=chrome..69i57j35i39j0l5j69i60.1017j0j7&sourceid=chrome&ie=UTF-8"
source_code = requests.get(url, headers=headers)
soup = BeautifulSoup(source_code.text, "html.parser")
amd = soup.find('div', attrs = {'class': 'aviV4d'})
print(amd.get_text(strip=True, separator='|').split('|')[:3])

打印：

['Advanced Micro Devices', 'NASDAQ: AMD', '48,16']

如何抓取Google搜索结果的一部分

3 个答案: