Question

我正在学习BeautifulSoup，并尝试加载this网页的内容。我正在尝试通过HTML tags深入inspect element来获取内容。

我使用了不同的代码段来显示和检查我是否能够成功检索内容。

以下代码段产生的结果很好：

from bs4 import BeautifulSoup
import requests

root = 'https://www.quora.com/topic/Graduate-Record-Examination-GRE-1'
r = requests.get(root)

soup = BeautifulSoup(r.text,'html.parser')

#**The following worked yielded some results :**

#1
a = soup.find_all('div',{'class':'feed'})
print(a)

#2
b = soup.find_all('div',{'class':'ContentWrapper'})
print(b)

#3
c = soup.find_all('div',{'class':'ContentWrapper'})
print(c)

#4
d = soup.find_all('div',{'class':'feed'})
print(d)

#5
e = soup.find_all('div',{'class':'TopicFeed'})
print(e)

但是，在深入探讨之后，以下内容并未产生任何效果：

f = soup.find_all('div',{'class':'paged_list_wrapper'})
print(f)

它打印：[]

<div class='paged_list_wrapper'>中的内容/ HTML代码未打印。为什么？

Answer 1

可以将站点配置为基于用户代理发送不同的页面。我遇到了和您一样的问题。它返回一个空列表。在标头中添加通用用户代理即可为我解决。

from bs4 import BeautifulSoup
import requests
root = 'https://www.quora.com/topic/Graduate-Record-Examination-GRE-1'
headers = {'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.' }
r = requests.get(root,headers=headers)
soup = BeautifulSoup(r.text,'html.parser')
f = soup.findAll('div',{'class':'paged_list_wrapper'})
print(f)

无法使用BeautifulSoup检索页面内容

1 个答案: