我从这里尝试了这个教程:
concat
这是我试图抓住一个steemit帖子的脚本: https://www.youtube.com/watch?v=XQgXKtPSzUI&list=WL&index=93
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://steemit.com/test/@bitcoinfree/test-4'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,'html.parser')
print(page_soup.prettify("utf-8"))
目前代码正在输出乱码。
我不知道如何获得纯HTML源代码。 我究竟做错了什么 ? :(
答案 0 :(得分:0)
import requests
from bs4 import BeautifulSoup
url = 'https://steemit.com/test/@bitcoinfree/test-4'
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
print(soup.prettify())