Question

我从这里尝试了这个教程： concat

这是我试图抓住一个steemit帖子的脚本： https://www.youtube.com/watch?v=XQgXKtPSzUI&list=WL&index=93

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


my_url = 'https://steemit.com/test/@bitcoinfree/test-4'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,'html.parser')
print(page_soup.prettify("utf-8"))

目前代码正在输出乱码。

我不知道如何获得纯HTML源代码。我究竟做错了什么？：（

Answer 1

得到了它。

import requests
from bs4 import BeautifulSoup

url = 'https://steemit.com/test/@bitcoinfree/test-4'
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

print(soup.prettify())

试图用python抓取这个页面，但返回乱码

1 个答案: