Question

我正在从输入文件中读取行并将每行分成列表。但是，我遇到了以下困扰我的情况。

这是我的代码：

with open("filename") as in_file:
    for line in in_file:
        print re.split(r'([\s,:()\[\]=|/\\{}\'\"<>]+)', line)

这是我输入文件的演示：

PREREQUISITES

    CUDA 7.0 and a GPU of compute capability 3.0 or higher are required.


    Extract the cuDNN archive to a directory of your choice, referred to below as <installpath>.
    Then follow the platform-specific instructions as follows.

这是我得到的输出结果：

['PREREQUISITES', '\n', '']
['', '\n', '']
['', '    ', 'CUDA', ' ', '7.0', ' ', 'and', ' ', 'a', ' ', 'GPU', ' ', 'of', ' ', 'compute', ' ', 'capability', ' ', '3.0', ' ', 'or', ' ', 'higher', ' ', 'are', ' ', 'required.', '\n', '']
['', '\n', '']
['', '\n', '']
['', '    ', 'Extract', ' ', 'the', ' ', 'cuDNN', ' ', 'archive', ' ', 'to', ' ', 'a', ' ', 'directory', ' ', 'of', ' ', 'your', ' ', 'choice', ', ', 'referred', ' ', 'to', ' ', 'below', ' ', 'as', ' <', 'installpath', '>', '.', '\n', '']
['', '    ', 'Then', ' ', 'follow', ' ', 'the', ' ', 'platform-specific', ' ', 'instructions', ' ', 'as', ' ', 'follows.', '\n', '']

我的问题是：

Q1：在每一行的末尾，除了字符\n之外，还有另一个空元素''。那是什么？

Q2：第一个，所有其他行都以这个空元素''开头。那是为什么？

修改

添加了问题Q3：我希望结果中保留' '和'\n'这样的分隔符，但不是空的''。有没有办法做到这一点？

回答问题Q1-2：here。

回答问题Q3：here。

Answer 1

空字符串表示'\n'已匹配为行中的最后一个字符，并且后面没有更多数据。那就是：

>>> re.split(r'([\s]+)', 'hello world\n')
['hello', ' ', 'world', '\n', '']

应该产生与以下不同的结果：

>>> re.split(r'([\s]+)', 'hello world')
['hello', ' ', 'world']

您可以在分割之前剥离线条：

>>> re.split(r'([\s]+)', 'hello world\n'.strip())
['hello', ' ', 'world']

或反转正则表达式并改为使用findall。 findall的工作方式不同，因为它不会在匹配的文本之间生成序列。

>>> re.findall(r'([^\s]+)', 'hello world\n')
['hello', 'world']

拆分

1 个答案: