用于从谷歌搜索中获取温度的Python脚本

时间:2016-01-29 19:09:02

标签: python beautifulsoup python-requests lxml

我正在制作一个python脚本,它将通过搜索关键字温度从谷歌获得温度。 我发现温度值存储在span id =" wob_tm"从这个检查元素代码 - >

<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px"><span id="wob_tm" class="wob_t" style="display:inline">
  18
</span><span id="wob_ttm" class="wob_t" style="display:none"> … </span>
</div>

可以看出,温度18在跨度id =&#34; wob_tm&#34;内。 所以,我的python脚本是 - &gt;

    from bs4 import BeautifulSoup
import requests,sys,webbrowser    

str="temperature"
res = requests.get('http://google.com/search?q=%s'%str)
res.raise_for_status()
examplesoup= BeautifulSoup(res.text,"lxml")    
linkelems=examplesoup.findAll("span",{"id":"wob_tm"})
print linkelems.string.strip()

它给了我这个错误 - AttributeError:&#39; NoneType&#39;对象没有属性&#39; string&#39; 怎么纠正呢?这意味着linkelems没有元素。

4 个答案:

答案 0 :(得分:2)

根据一些实验,谷歌似乎会发送略有不同的结果,具体取决于它认为您使用的浏览器。例如,当我使用Firefox时,我会看到id为'wob_tm'的跨度,但在运行代码时默认情况下不会。 (我确实得到了具有温度的类wob_t的跨度,但我还获得了10个其他wob_t跨度)。尝试将用户代理设置为常用浏览器,如下所示:

str="temperature"

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'
}

res = requests.get('http://www.google.com/search?q=%s' % str, headers=headers)
res.raise_for_status()
examplesoup=BeautifulSoup(res.text,'lxml')
linkelems=examplesoup.findAll('span', {'id': 'wob_tm'}) # This now has an element in it

答案 1 :(得分:0)

您正在打印的0是span标记内容的长度,而不是内容本身。 string属性将为您提供div标签的内容:

from bs4 import BeautifulSoup
s = """<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px">
<span id="wob_tm" class="wob_t" style="display:inline">
18
</span><span id="wob_ttm" class="wob_t" style="display:none"> … </span>
</div>"""
soup = BeautifulSoup(s)
temperature = soup.find("span", id="wob_tm")
print(temperature.string.strip())
# 18

答案 2 :(得分:0)

我运行了这段代码(使用Python 3和bs4)并获得了span标记的字符串。

from bs4 import BeautifulSoup
html_snippet = """<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px"><span id="wob_tm" class="wob_t" style="display:inline">18</span><span id="wob_ttm" class="wob_t" style="display:none"> ... </span></div>"""

soup = BeautifulSoup(html_snippet)
temp = soup.find("span", id='wob_tm')

print(temp.string)

答案 3 :(得分:0)

确保您使用的是 user-agent,以便 Google 不会将您的请求视为 python-requests,这是默认的 requests User-Agent。如果您只需要提取温度数据,您可以使用 .select_one() bs4 方法。

>>> soup.select_one('#wob_tm').text
'85°F'

提取更多in the online IDE的代码和示例:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "london weather",
  "hl": "en",
}

response = requests.get('https://www.google.com/search', headers=headers, params=params).text
soup = BeautifulSoup(response, 'lxml')

tempature = soup.select_one('#wob_tm').text
print(f'Tempature: {tempature}')

---
# Tempature: 73°F

或者,您可以使用来自 SerpApi 的 Google Direct Answer Box API。这是一个带有免费计划的付费 API。

要集成的代码:

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "london weather",
  "api_key": os.getenv("API_KEY"),
  "hl": "en",
}

search = GoogleSearch(params)
results = search.get_dict()

loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
unit = results['answer_box']['unit']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']

forecast = results['answer_box']['forecast']

print(f'{loc}\n{weather_date}\n{weather}\n{temp}\n{unit}\n{precipitation}\n{humidity}\n{wind}\n{forecast}')

---------
'''
London, UK
Wednesday 1:00 PM
Partly cloudy
73°F
0%
55%
7 mph

[{'day': 'Wednesday', 'weather': 'Partly cloudy', 'temperature': {'high': '74', 'low': '59'}, 'thumbnail': 'https://ssl.gstatic.com/onebox/weather/48/partly_cloudy.png'}..]
'''
<块引用>

免责声明,我为 SerpApi 工作。