Question

我想打开这样的网页文档：

directory = "/Path/to/file/"
with open(directory+"test.pages") as file:
    data = f.readlines()
    for line in data:
        words = line.split()
        print words

然后我收到了这个错误：

IOError: [Errno 21] Is a directory: '/path/to/file/test.pages'

为什么这是一个目录？那我怎么打开呢？

Answer 1

'/path/to/file/test.pages'是文件系统上的目录，因此无法在Python中打开。您的操作系统正在该目录中捆绑多个文件，并可能将其作为单个包呈现。你可以想象地走一下目录并获取内容：

for root, dirs, files in os.walk('/path/to/file/test.pages'):
    for file in files:
        print os.path.join(root, file)

但是打开文件并尝试阅读其内容可能会毫无结果。

我将向您展示如何尝试查找任何纯文本：

import re
# use a pattern that matches for any letter A-Z, upper and lower, 0-9, and _
pattern = re.compile(r'.*\w+.*')

for root, dirs, files in os.walk('/path/to/file/test.pages'):
    for file in files:
        # open each file with the context manager so it's automatically closed
        # regardless if there's an error. Use the Universal Newlines (U) flag too
        # as a best practice (Unix, Linux, and MS have different newlines).
        with open(os.path.join(root, file), 'rU') as f:
            for line in f:
                if re.match(pattern, line):
                    print line

Answer 2

我有一个OSX 10.9.3的Macbook Pro。

我使用了你的代码，我没有你引用的问题。由于您将打开.pages文件，因此您需要解码该文件：

File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal not in range(128)

使用Python在Mac上打开.pages文件

2 个答案: