Question

我一直在研究一个程序，但由于Mac OS X在更新python方面存在困难，我一直在3.2和2.6中都这样做，但是，这两个版本的脚本都给了我IOErrors（它们是不同的）虽然）。这是脚本：

这是3.2版本：

import sys
import os 
import re 
import urllib 
import urllib.request

## opens the URL as a bytes object
urlfilebytes = urllib.request.urlopen('http://www.reddit.com/r/fffffffuuuuuuuuuuuu')
## saves the bytes object to a string
urlfile = urlfilebytes.read().decode('utf-8'))
## saves list of matches for pattern
matches = re.findall(r'[http://imgur.com/][\s]+"', open(urlfile).read())

这会返回错误： TypeError：无效文件：

另一方面，2.6版本：

import sys
import os
import re
import urllib
urlfilebytes = urllib.urlopen('http://www.reddit.com/r/fffffffuuuuuuuuuuuu')
urlfile = urlfilebytes.read().decode('utf-8')
matches = re.findall(r'[http://imgur.com/][\s]+"', open(urlfile).read())

这将返回错误：

IOError: [Errno 63] File name too long: u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" ><head><title>FFFFFFFUUUUUUUUUUUU-</title><meta name="keywords" content=" r **ETC ETC ETC**

我有点难过，有人可以帮帮我吗？

Answer 1

您在字符串上调用open，该字符串尝试打开一个名为字符串包含的文件。在这种情况下<!DOCTYPE...。这不是有效的文件名或现有文件。如果您仅使用open(urlfile).read()替换urlfile，则应该有效。

此外，您可能想要在正则表达式中转义[]，否则它将无法执行您想要的操作。

Answer 2

你确定你不想这样做吗？

re.findall(r'[http://imgur.com/][\s]+"', urlfile)

而且我敢打赌正则表达式没有按照你的想法做到。也许你需要问另一个关于那个的问题

也许是这样的

re.findall(r'(http://imgur.com/\S+)"', urlfile)

或者

re.findall(r'http://imgur.com/(\S+)"', urlfile)

python中正则表达式的IOErrors

2 个答案: