我想通过python脚本中的Google搜索获取“ Spotlight 29 casino address”的地址。为什么我的代码无法正常工作?
from bs4 import BeautifulSoup
# from googlesearch import search
import urllib.request
import datetime
article='spotlight 29 casino address'
url1 ='https://www.google.co.in/#q='+article
content1 = urllib.request.urlopen(url1)
soup1 = BeautifulSoup(content1,'lxml')
#print(soup1.prettify())
div1 = soup1.find('div', {'class':'Z0LcW'}) #get the div where it's located
# print (datetime.datetime.now(), 'street address: ' , div1.text)
print (div1)
答案 0 :(得分:0)
如果您想获得Google搜索结果。 Selenium with Python是更简单的方法。
下面是简单的代码。
from selenium import webdriver
import urllib.parse
from bs4 import BeautifulSoup
chromedriver = '/xxx/chromedriver' #xxx is chromedriver in your installed path
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chromedriver, chrome_options=chrome_options)
article='spotlight 29 casino address'
driver.get("https://www.google.co.in/#q="+urllib.parse.quote(article))
# driver.page_source <-- html source, you can parser it later.
soup = BeautifulSoup(driver.page_source, 'lxml')
div = soup.find('div',{'class':'Z0LcW'})
print(div.text)
driver.quit()
答案 1 :(得分:0)
Google为此使用javascript渲染,这就是为什么您不会通过urllib.request.urlopen接收该div。
作为解决方案,您可以使用selenium-用于仿真浏览器的python库。使用“ pip install selenium”控制台命令安装它,然后这样的代码将起作用:
from bs4 import BeautifulSoup
from selenium import webdriver
article = 'spotlight 29 casino address'
url = 'https://www.google.co.in/#q=' + article
driver = webdriver.Firefox()
driver.get(url)
html = BeautifulSoup(driver.page_source, "lxml")
div = html.find('div', {'class': 'Z0LcW'})
print(div.text)
答案 2 :(得分:0)
您得到一个空的 div
因为默认情况下有一个 python-requests
如果您使用的是 requests
库 (info)(或类似的东西){{1}并且您的请求被 Google 阻止。使用 user-agent
,您可以伪造用户浏览器访问。
如果地址是 HTML 代码(在本例中是这样),您可以在没有硒的情况下通过添加 user-agent
来实现它。
user agent
这是代码和full example:
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
输出:
from bs4 import BeautifulSoup
import requests
import lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get(
'https://www.google.com/search?q=spotlight 29 casino address',
headers=headers)
html = response.text
soup = BeautifulSoup(html, 'lxml')
print(soup.select_one(".sXLaOe, .iBp4i").text)