Question

我正在尝试从this page获取所有固定信息。

解析此页面非常简单。我只是使用浏览器的标头作为参数来运行request.get。

但是，当我尝试从其他页面（例如third page）获取数据时。

相同的request.get未能获得真实页面。

为解决此问题，每次Cookie失败时，我都必须对其进行更新。

代码（主要逻辑）是：

base_url = 'https://www.gumtree.com/property-to-rent/uk/flat'
doc = parse_to_doc(base_url, headers=headers)
number = int(get_selector_from(doc).xpath('//text()[contains(., "ads in")]')[0].split(' ads')[0].replace(',', ''))
print(f'There are {number} records in total.')

all_records = []
i = 1
while len(all_records) < number:
    print(len(all_records))
    i += 1
    url = f'{base_url}/page{i}'
    all_records.extend(get_table(url))
    time.sleep(10)

headers是我从浏览器复制的字典。

此代码的输出是：

There are 28030 records in total.
0
30
30
30
30
...

这意味着从首页成功获取30条记录后，该Cookie不再起作用。

如何解决此网站的cookie问题？

0 个答案: