Question

以下是我的代码的一小部分：

import requests
from bs4 import BeautifulSoup

def mancity():
    manclist = []
    f = 'mancity.txt'
    fo = open(f, 'w')
    root_url = "http://www.whoscored.com"
    index_url = root_url + "/Teams/167"
    r = requests.get(index_url)
    soup = BeautifulSoup(r.content)
    playstyle = soup.find_all("div",{"class": "character-card singular"})
    for item in playstyle:
        chellist.append(item.text)
    mstr = ''.join(map(str, manclist))
    fo.write(mstr)
    print(mstr)

代码的想法是刮取网站并将数据提取到文本文件。现在，代码无法正常工作。有时它有效，有时它根本不返回任何值。我不知道为什么会这样？是因为我的请求被拒绝了吗？它很烦人，因为什么都没有返回，文本文件被空格覆盖。

请自行运行代码，并查看输出。（这是出于教育目的）

Answer 1

使用请求时，您可以检查生成的对象的status_code属性。成功请求应该在200到299之间。在这种情况下，r.content可以包含对错误的一些解释。

Answer 2

我写了另一个类似的程序，抓取一个不同的网站，它完美无缺。我想因为这个网站在某种程度上否定了我的要求。

Python网页抓取，问题

2 个答案: