python请求无法获取完整的数据

时间:2019-04-23 06:25:05

标签: python html web-scraping beautifulsoup

我正试图从https://www.pastemagazine.com/blogs/lists/2009/11/the-best-albums-of-the-decade.html?a=1中刮取十年(2000-2009年)的50张最佳专辑。

我在python中使用以下代码:

from requests import get 
url = 'https://www.pastemagazine.com/blogs/lists/2009/11/the-best-albums-of-the-decade.html?a=2'
response = get(url) 
print(response.text)

当我查看响应时,输出中缺少所有50张最佳专辑的信息。当我查看页面源代码时,确实在<div class="grid-x article-wrapper">下看到了此信息。为了抓取这部分网页我需要做什么?

1 个答案:

答案 0 :(得分:1)

您需要定义一个标头,使其更像真正的浏览器。以下应该起作用。

import requests
from bs4 import BeautifulSoup

url = 'https://www.pastemagazine.com/blogs/lists/2009/11/the-best-albums-of-the-decade.html?a=2'

res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"}) 
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("b.big > b"):
    print(item.text)

输出类似于:

50. Björk: Vespertine [Elektra] 2001
49. Libertines: Up The Bracket [Rough Trade] (2002)
48. Loretta Lynn: Van Lear Rose [Interscope] (2004)
47. Arctic Monkeys: Whatever People Say I Am, That’s What I’m Not [Domino] (2006)