Python Web爬虫问题

时间:2020-08-14 04:25:39

标签: python web screen-scraping

我使用的是请求-HTML,美观,可以抓取网站,下面是代码。奇怪的是,有时我可以在使用print(soup.get_text())时从网上获取文本,而在使用print(soup)时可以得到一些随机代码-附在图像中。

session = HTMLSession()
r = session.get(url)
soup = bs(r.content, "html.parser")
print(soup.get_text())
#print(soup)

The program return this when I tried to look at the soup

1 个答案:

答案 0 :(得分:0)

我认为该网站受javascript保护。请尝试使用此方法。它可能会有所帮助

import requests
from bs4 import BeautifulSoup

r = requests.get(url)
print(r.text)

#if you want the whole content you can just do slicing stuff on the response stored in r or rather just do it with bs4

soup = BeautifulSoup(r.text, "html.parser")
print(soup.text)