将以下内容粘贴到浏览器上时,以下内容有效:
http://www.somesite.com/details.pl?urn=2344
但是当我尝试用Python阅读URL时,没有任何事情发生:
link = 'http://www.somesite.com/details.pl?urn=2344'
f = urllib.urlopen(link)
myfile = f.readline()
print myfile
我是否需要对网址进行编码,或者有什么我看不到的内容?
答案 0 :(得分:116)
回答你的问题:
import urllib
link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)
您需要read()
,而不是readline()
编辑(2018-06-25):自Python 3以来,遗留urllib.urlopen()
已由urllib.request.urlopen()
取代(详见https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen的说明)。
如果您使用的是Python 3,请参阅Martin Thoma或i.n.n.m在此问题中的答案: https://stackoverflow.com/a/28040508/158111(Python 2/3 compat) https://stackoverflow.com/a/45886824/158111(Python 3)
或者,只需在此处获取此库:http://docs.python-requests.org/en/latest/并认真使用它:)
import requests
link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)
答案 1 :(得分:10)
使用Python 2.X和Python 3.X的解决方案利用Python 2和3兼容性库six
:
from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)
答案 2 :(得分:9)
对于python3
个用户,为节省时间,请使用以下代码
from urllib.request import urlopen
link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"
f = urlopen(link)
myfile = f.read()
print(myfile)
我知道错误有不同的主题:Name Error: urlopen is not defined
,但认为这可能会节省时间。
答案 3 :(得分:0)
网址应为字符串:
import urllib
link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.readline()
print myfile
答案 4 :(得分:0)
我使用了以下代码:
import urllib
def read_text():
quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
contents_file = quotes.read()
print contents_file
read_text()
答案 5 :(得分:0)
我们可以阅读网站html内容如下:
from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)
答案 6 :(得分:0)
这些答案中的任何一个都不适合Python 3(在本文发布时已在最新版本上进行了测试)。
这就是您的操作方式...
def print_some_url():
with urllib.request.urlopen('http://mywebsiteurl') as f:
print(f.read().decode('utf-8'))
以上内容适用于返回“ utf-8”的内容。如果您希望python“猜测适当的编码”,请删除.decode('utf-8')。
文档: https://docs.python.org/3/library/urllib.request.html#module-urllib.request
答案 7 :(得分:0)
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.
import sys
if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
from urllib import urlopen
with urlopen('https://www.facebook.com/') as \
url:
data = url.read()
print data
# When the server does not know where the request is coming from.
# Works on python 3.
import urllib.request
user_agent = \
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}
request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data
答案 8 :(得分:0)
# retrieving data from url
# only for python 3
import urllib.request
def main():
url = "http://docs.python.org"
# retrieving data from URL
webUrl = urllib.request.urlopen(url)
print("Result code: " + str(webUrl.getcode()))
# print data from URL
print("Returned data: -----------------")
data = webUrl.read().decode("utf-8")
print(data)
if __name__ == "__main__":
main()
答案 9 :(得分:0)
from urllib.request import urlopen
# if has Chinese, apply decode()
html = urlopen("https://blog.csdn.net/qq_39591494/article/details/83934260").read().decode('utf-8')
print(html)
答案 10 :(得分:0)
您可以使用requests
和beautifulsoup
库来读取网站上的数据。只需安装这两个库并键入以下代码。
import requests
import bs4
help(requests)
help(bs4)
您将获得有关该库所需的所有信息。