Question

我正在尝试在网页中抓取搜索结果，但是当我输入完整的搜索栏（即ABC）时，它并不能反映URL中的搜索，因此当我使用BeautifulSoup4抓取URL时，它会给我“无”的提示，

是否可以找到/写入包含搜索参数的URL？

我尝试将'BeautifulSoup与'requests'和'lxml'解析器一起使用，但是结果却消失了'None'。

from bs4 import BeautifulSoup
import requests

source = requests.get('URL').text
soup = BeautifulSoup(source, 'lxml')

article = Soup.find('div')
print(article.prettify())

headline = article.div.hs.text

Answer 1

Beautifulsoap并不提供所有刮擦物品。因此，使用其他效率更高的方法如Selenium。我将展示一些使用它的示例。如果尚未安装，则可以将其安装在：

https://chromedriver.storage.googleapis.com/index.html?path=2.35/

用法：

from  selenium import webdriver
url = "URL"
driver_path = r'chromedriverpath'
browser = webdriver.Chrome(executable_path=driver_path)
browser.get(url)
x = driver.find_elements_by_css_selector("div") 
#For more specific :
x1 = driver.find_elements_by_css_selector("div[class='classname']") 

for all in x:
    print(all.text)

Answer 2

您需要检查requests.Response对象以查看URL是什么。

>>> import requests
>>> _tquery = requests.sessions.Session()
>>> qresults =  _tquery.request(method="get", url="https://www.google.com/search?q=python%20scraping%20module")

<Response [200]>

>>> qresults.url
'https://www.google.com/search?q=python%20scraping%20module'

BeautifulSoup只会帮助您解析Response对象的text属性。

当网址中没有搜索查询时，如何使用搜索栏结果抓取网页

2 个答案: