我使用的是请求-HTML,美观,可以抓取网站,下面是代码。奇怪的是,有时我可以在使用print(soup.get_text())时从网上获取文本,而在使用print(soup)时可以得到一些随机代码-附在图像中。
session = HTMLSession()
r = session.get(url)
soup = bs(r.content, "html.parser")
print(soup.get_text())
#print(soup)
答案 0 :(得分:0)
我认为该网站受javascript保护。请尝试使用此方法。它可能会有所帮助
import requests
from bs4 import BeautifulSoup
r = requests.get(url)
print(r.text)
#if you want the whole content you can just do slicing stuff on the response stored in r or rather just do it with bs4
soup = BeautifulSoup(r.text, "html.parser")
print(soup.text)