Question

我需要解析site，但是我收到了错误403 Forbidden。这是一个代码：

url = 'http://worldagnetwork.com/'
result = requests.get(url)
print(result.content.decode())

其输出：

<html>
<head><title>403 Forbidden</title></head>
<body bgcolor="white">
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
</html>

请说出问题所在。

Answer 1

该网页似乎拒绝了未标识GET的{{1}}次请求。我使用浏览器（Chrome）访问了该页面并复制了User-Agent请求的User-Agent标题（请查看开发人员工具的“网络”标签中）：

GET

Answer 2

如果您是服务器的所有者/管理员，并且接受的解决方案对您不起作用，请尝试disabling CSRF protection (link to an SO answer)。

我正在使用Spring（Java），因此设置要求您制作包含以下内容的SecurityConfig.java文件：

@Configuration
@EnableWebSecurity
public class SecurityConfig extends WebSecurityConfigurerAdapter {
    @Override
    protected void configure (HttpSecurity http) throws Exception {
        http.csrf().disable();
    }
    // ...
}

Answer 3

只需添加到Alberto的答案中即可：

如果添加403 Forbidden后仍然得到user-agent，则可能需要添加更多标头，例如referer：

headers = {
    'User-Agent': '...',
    'referer': 'https://...'
}

标题可在开发人员工具的Network > Headers > Request Headers中找到。（按F12进行切换。）

Answer 4

尝试使用：

import requests

requests.get(url, auth=('username','password'))

Python请求。 403禁止

4 个答案: