Question

我有一段代码，它提取位于两个字符串之间的字符串。但是，这个脚本只在一行上执行此操作。我想在一个完整的文件上执行此操作并获取所有单词列表在这两个词之间。

注意：这两个词是固定的。例如：如果我的代码类似

'const int variablename=1'

然后我想要一个位于'int'和'='之间的文件中所有单词的列表。这是当前的脚本：

s='const int variablename = 1'

k=s[s.find('int')+4:s.find('=')]

print k

Answer 1

如果文件很适合内存，你可以通过一个正则表达式调用来实现：

import re
regex = re.compile(
r"""(?x)
(?<=    # Assert that the text before the current location is:
 \b     # word boundary
 int    # "int"
 \s     # whitespace
)       # End of lookbehind
[^=]*   # Match any number of characters except =
(?<!\s) # Assert that the previous character isn't whitespace.
(?=     # Assert that the following text is:
 \s*    # optional whitespace
 =      # "="
)       # end of lookahead""")
with open(filename) as fn:
    text = fn.read()
    matches = regex.findall(text)

如果int和=之间只有一个字，那么正则表达式会更简单一些：

regex = re.compile(
r"""(?x)
(?<=    # Assert that the text before the current location is:
 \b     # word boundary
 int    # "int"
 \s     # whitespace
)       # End of lookbehind
[^=\s]*   # Match any number of characters except = or space
(?=     # Assert that the following text is:
 \s*    # optional whitespace
 =      # "="
)       # end of lookahead""")

Answer 2

with open(filename) as fn:
    for row in fn:
        # do something with the row?

Answer 3

我会在整个文本上使用正则表达式（你也可以在一行上使用它）。这将打印“int”和“=”

之间的字符串

import re

text = open('example.txt').read()
print re.findall('(?<=int\s).*?(?=\=)', text)

Answer 4

如果你想要一种快速而肮脏的方式，那么你就是在类似unix的系统上。

我应该在文件上使用grep。然后我将分割字符串以识别我想要的模式和数据。

Python中的文件操作

4 个答案: