我正在尝试制作一个终端应用程序来爬网网站并返回输入城市名称的时间。到目前为止,这是我的代码:
import re
import urllib.request
city = input('Enter city name: ')
url = 'https://time.is/'
rawData = urllib.request.urlopen(url).read()
decodedData = rawData.decode('utf-8')
print(decodedData)
最后一行之后,出现此错误:
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
rawData = urllib.request.urlopen(url).read()
File "~/Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "~/Python\Python35-32\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "~/Python\Python35-32\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "~/Python\Python35-32\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "~/Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "~/Python\Python35-32\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
为什么会出现此错误?怎么了?
[编辑] 原因是time.is禁止了请求。进行网页抓取时,请务必记住阅读条款和条件。可以找到免费的API来完成相同的工作。
答案 0 :(得分:1)
发生这种情况时,我通常会打开调试器,并尝试找出访问网站时所调用的内容。好像是time.is不喜欢让脚本调用其网站。
快速搜索得出以下结果:
1532027279136 0 161_(UTC,_UTC+00:00) 1532027279104
Time.is is for humans. To use from scripts and apps, please ask about our API. Thank you!
以下是一些您可以用来构建项目的API。 https://www.programmableweb.com/category/time/api