我决定查看网站的源代码,并选择了一个“扩展”的类(I found it using view-source,美化()显示不同的代码)。我想打印出所有内容,使用以下代码:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.quora.com/How-can-I-write-a-bot-using-Python")
soup = BeautifulSoup(page.content, 'html.parser')
print soup.find_all(class_='expanded')
但它只是打印出来:
[]
请帮我发现错误。
我已经看到了this thread,并尝试按照答案说的但是它没有帮助我,因为终端出现了这个错误:
bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml。您需要安装解析器库吗?
答案 0 :(得分:0)
我看了一下有问题的网站,唯一类似的类实际上名为ui_qtext_expanded
当您使用findAll
/ find_all
时,您必须迭代它以返回每个项目,因为它是使用.text
的项目列表。也就是说,如果您想要文本和不是实际的HTML ..
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.quora.com/How-can-I-write-a-bot-using-Python")
soup = BeautifulSoup(page.content, 'html.parser')
res = soup.find_all(class_='ui_qtext_expanded')
for i in res:
print i.text
链接输出的开头是
A combination of mechanize, Requests and BeautifulSoup works pretty good for the basic stuff.Learn about mechanize here.Mechanize is sufficient for basic form filling, form submission and that sort of stuff, but for real browser emulation (like dealing with Javascript rendered HTML) you should look into selenium.