我想打开网站并获取其内容,将其存储在变量中并打印出来
from urllib.request import urlopen
url = any_website
content = urlopen(url).read().decode('utf-8')
print(content)
预期结果是我得到页面上写的内容
答案 0 :(得分:2)
在python
中,您可能会对几个库感兴趣。下面是打印内容的示例,可帮助您入门:-
from bs4 import BeautifulSoup as soup
import requests
url = "https://en.wikipedia.org/wiki/List_of_multinational_corporations"
page = requests.get(url)
page_html = (page.content)
page_soup = soup(page_html, "html.parser")
print (page_soup)
使用urlopen
,您可以尝试以下操作
from bs4 import BeautifulSoup
import urllib
url = "https://en.wikipedia.org/wiki/List_of_multinational_corporations"
r = urllib.urlopen(url).read()
soup = BeautifulSoup(r)
print type(soup)
print (soup.prettify()[0:1000])