Question

我的文件包含这样的行列表：

-apple
banana tomato
-orange
maracuja
cucumber <hide>
peanut
-apple
apricot
grapefruit </hide> banana
lime
-grape
lemon

我想找到所有-前缀的行并将它们写入dict，如下所示：{original_line_number: '-apple', ...}，但不包括<hide></hide>段中的那些。这看起来很简单，但在我的实际用例中，我有多个非常复杂的<hide>序列。对于每个我在re.compile中编写的复杂正则表达式模式，如下所示：

re.compile(r'really complicated regex for 1st hide sequence (' + r'|'.join(some_list_of_possibilities) + r') yeah it still continue%s' % not_enough_complicated_yet)

有没有办法获取带有-前缀的行列表，排除隐藏序列中的行并仍然索引其原始行号？

我已经尝试过的事情：

删除隐藏的序列，获取-前缀行，与原始列表进行比较并获取行号：如果内部有一个-前缀行且隐藏序列外的同一行，则为失败，如示例所示-apple
将隐藏序列中的所有字符替换为除\n个字符以外的空格：对我失败，因为我无法找到如何保留\n个字符（不是用空格替换它们）

注意：我希望获得-前缀行＆＃34;按原样＃34;因此如果有像-apple <hide> banana这样的行，我想获得整行内容：/

Answer 1

这对你有用吗？

with open(file) as f:
    content = f.readlines()


res = []
skip = False
for index, x in enumerate(content):
    val = x.strip()
    if skip:
        if '</hide>' in val:
            skip = False

    if '<hide>' in val:
        skip = True

    if not skip:
        if val.startswith('-'):
            res.append({index+1: val})

print res

[{1: '-apple'}, {3: '-orange'}, {11: '-grape'}]

Answer 2

最后，我通过用空格（或任何其他替代字符）替换隐藏序列中的所有字符来解决问题 - 但除了换行符。这样可以保留行号并禁用隐藏序列。

在隐藏序列之外搜索，但获取原始行号

2 个答案: