我正在制作一个python脚本,它将通过搜索关键字温度从谷歌获得温度。 我发现温度值存储在span id =" wob_tm"从这个检查元素代码 - >
<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px"><span id="wob_tm" class="wob_t" style="display:inline">
18
</span><span id="wob_ttm" class="wob_t" style="display:none"> … </span>
</div>
可以看出,温度18在跨度id =&#34; wob_tm&#34;内。 所以,我的python脚本是 - &gt;
from bs4 import BeautifulSoup
import requests,sys,webbrowser
str="temperature"
res = requests.get('http://google.com/search?q=%s'%str)
res.raise_for_status()
examplesoup= BeautifulSoup(res.text,"lxml")
linkelems=examplesoup.findAll("span",{"id":"wob_tm"})
print linkelems.string.strip()
它给了我这个错误 - AttributeError:&#39; NoneType&#39;对象没有属性&#39; string&#39; 怎么纠正呢?这意味着linkelems没有元素。
答案 0 :(得分:2)
根据一些实验,谷歌似乎会发送略有不同的结果,具体取决于它认为您使用的浏览器。例如,当我使用Firefox时,我会看到id为'wob_tm'的跨度,但在运行代码时默认情况下不会。 (我确实得到了具有温度的类wob_t的跨度,但我还获得了10个其他wob_t跨度)。尝试将用户代理设置为常用浏览器,如下所示:
str="temperature"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'
}
res = requests.get('http://www.google.com/search?q=%s' % str, headers=headers)
res.raise_for_status()
examplesoup=BeautifulSoup(res.text,'lxml')
linkelems=examplesoup.findAll('span', {'id': 'wob_tm'}) # This now has an element in it
答案 1 :(得分:0)
您正在打印的0
是span标记内容的长度,而不是内容本身。 string
属性将为您提供div标签的内容:
from bs4 import BeautifulSoup
s = """<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px">
<span id="wob_tm" class="wob_t" style="display:inline">
18
</span><span id="wob_ttm" class="wob_t" style="display:none"> … </span>
</div>"""
soup = BeautifulSoup(s)
temperature = soup.find("span", id="wob_tm")
print(temperature.string.strip())
# 18
答案 2 :(得分:0)
我运行了这段代码(使用Python 3和bs4)并获得了span标记的字符串。
from bs4 import BeautifulSoup
html_snippet = """<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px"><span id="wob_tm" class="wob_t" style="display:inline">18</span><span id="wob_ttm" class="wob_t" style="display:none"> ... </span></div>"""
soup = BeautifulSoup(html_snippet)
temp = soup.find("span", id='wob_tm')
print(temp.string)
答案 3 :(得分:0)
确保您使用的是 user-agent
,以便 Google 不会将您的请求视为 python-requests
,这是默认的 requests
User-Agent
。如果您只需要提取温度数据,您可以使用 .select_one()
bs4
方法。
>>> soup.select_one('#wob_tm').text
'85°F'
提取更多in the online IDE的代码和示例:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "london weather",
"hl": "en",
}
response = requests.get('https://www.google.com/search', headers=headers, params=params).text
soup = BeautifulSoup(response, 'lxml')
tempature = soup.select_one('#wob_tm').text
print(f'Tempature: {tempature}')
---
# Tempature: 73°F
或者,您可以使用来自 SerpApi 的 Google Direct Answer Box API。这是一个带有免费计划的付费 API。
要集成的代码:
from serpapi import GoogleSearch
import os
params = {
"engine": "google",
"q": "london weather",
"api_key": os.getenv("API_KEY"),
"hl": "en",
}
search = GoogleSearch(params)
results = search.get_dict()
loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
unit = results['answer_box']['unit']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']
forecast = results['answer_box']['forecast']
print(f'{loc}\n{weather_date}\n{weather}\n{temp}\n{unit}\n{precipitation}\n{humidity}\n{wind}\n{forecast}')
---------
'''
London, UK
Wednesday 1:00 PM
Partly cloudy
73°F
0%
55%
7 mph
[{'day': 'Wednesday', 'weather': 'Partly cloudy', 'temperature': {'high': '74', 'low': '59'}, 'thumbnail': 'https://ssl.gstatic.com/onebox/weather/48/partly_cloudy.png'}..]
'''
<块引用>
免责声明,我为 SerpApi 工作。