Question

我使用的是请求-HTML，美观，可以抓取网站，下面是代码。奇怪的是，有时我可以在使用print（soup.get_text（））时从网上获取文本，而在使用print（soup）时可以得到一些随机代码-附在图像中。

session = HTMLSession()
r = session.get(url)
soup = bs(r.content, "html.parser")
print(soup.get_text())
#print(soup)

The program return this when I tried to look at the soup

Answer 1

我认为该网站受javascript保护。请尝试使用此方法。它可能会有所帮助

import requests
from bs4 import BeautifulSoup

r = requests.get(url)
print(r.text)

#if you want the whole content you can just do slicing stuff on the response stored in r or rather just do it with bs4

soup = BeautifulSoup(r.text, "html.parser")
print(soup.text)

Python Web爬虫问题

1 个答案: