我收到的错误让我相信我的程序无法找到我认识的网站。该网站是
https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207
我的代码看起来像
from urllib import request as u_r
def strip_webite():
with u_r.urlopen("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") as f:
pass
if __name__ == "__main__":
strip_webite()
我得到的错误是
File "database_management.py", line 19, in <module>
strip_webite()
File "database_management.py", line 15, in strip_webite
with u_r.urlopen("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") as f:
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
答案 0 :(得分:2)
看起来Transfermarkt正在使用Python的urllib
库发送的默认from urllib import request as u_r
def strip_webite():
request = u_r.Request("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207")
request.add_header('User-Agent', 'my-cool-app')
with u_r.urlopen(request) as f:
pass
if __name__ == "__main__":
strip_webite()
字符串阻止来自机器人的请求,尽管它在{{{}}中没有提及任何相关信息。 3}}
这似乎意味着他们不介意我们抓他们,但他们更愿意我们宣布我们是谁。
要使用urllib执行此操作,请执行以下操作:
<bean id="ProcessorRef" class="com.healthedge.customer.THC.extractor.ProcessorClass">
<bean ref="ProcessorRef" method="whatAmI('your_parameter_here')" />