我正在尝试在Centos7服务器上编写一个python(版本2.7.5)CGI脚本。
我的脚本尝试从librivox的网页下载数据,如... https://librivox.org/selections-from-battle-pieces-and-aspects-of-the-war-by-herman-melville/
,我的脚本因此错误而爆炸:
<class 'urllib2.URLError'>: <urlopen error [Errno 13] Permission denied>
args = (error(13, 'Permission denied'),)
errno = None
filename = None
message = ''
reason = error(13, 'Permission denied')
strerror = None
我已关闭iptables
我可以毫无错误地执行“wget -O- https://librivox.org/selections-from-battle-pieces-and-aspects-of-the-war-by-herman-melville/”之类的操作。以下是发生错误的代码:
def output_html ( url, appname, doobb ):
print "url is %s<br>" % url
soup = BeautifulSoup(urllib2.urlopen( url ).read())
更新:感谢Paul和alecxe我已将代码更新为:
def output_html ( url, appname, doobb ):
#hdr = {'User-Agent':'Mozilla/5.0'}
#print "url is %s<br>" % url
#req = url2lib2.Request(url, headers=hdr)
# soup = BeautifulSoup(urllib2.urlopen( url ).read())
headers = {'User-Agent':'Mozilla/5.0'}
# headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36'}
response = requests.get( url, headers=headers)
soup = BeautifulSoup(response.content)
......当......时我得到一个稍微不同的错误。
response = requests.get( url, headers=headers)
......被召唤......
<class 'requests.exceptions.ConnectionError'>: ('Connection aborted.', error(13, 'Permission denied'))
args = (ProtocolError('Connection aborted.', error(13, 'Permission denied')),)
errno = None
filename = None
message = ProtocolError('Connection aborted.', error(13, 'Permission denied'))
request = <PreparedRequest [GET]>
response = None
strerror = None
...有趣的是写了这个脚本的命令行版本,它运行正常,看起来像这样......
def output_html ( url ):
soup = BeautifulSoup(urllib2.urlopen( url ).read())
你觉得很奇怪吗?
更新: 这个问题可能已经有了答案: urllib2.HTTPError:HTTP错误403:禁止2个答案
他们没有回答问题
答案 0 :(得分:5)
终于想通了......
# grep python /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp
答案 1 :(得分:1)
使用requests
并提供User-Agent
标题适用于我:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36'}
response = requests.get("https://librivox.org/selections-from-battle-pieces-and-aspects-of-the-war-by-herman-melville/", headers=headers)
soup = BeautifulSoup(response.content)
print soup.title.text # "prints LibriVox"
答案 2 :(得分:0)
我们的一台机器遇到了同样的问题。我们没有创建SELinux模块(如上面的答案中列出的那样),而是对SELinux布尔值进行了以下更改,以防止发生类似的错误
# setsebool httpd_can_network_connect on
如centos Wiki所述
httpd_can_network_connect(HTTPD服务)::允许HTTPD脚本和模块连接到网络。