Question

我想转换以下文本：

some text
% comment line 1
% comment line 2
% comment line 3
some more text

进入

some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text

在同一文件中，当仅注释一行时，我希望它来自

some text
% a single commented line
some more text

到

some text 
# a single commented line
some more text

因此，当两种情况都在同一文件中时，我想从以下内容开始：

some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text

到

some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text 
# a single commented line
some more text

到目前为止，我尝试的第二种情况是：

re.sub(r'(\A|\r|\n|\r\n|^)% ', r'\1# ',  'some text \n% a single comment line\nsome more text')

但是当注释多于一行时，它将%替换为#。

对于第二种情况，我失败了：

re.sub(r'(\A|\r|\n|\r\n|^)(% )(.*)(?:\n^\t.*)*', r'"""\3"""',  'some text \n% comment line1\n% comment line 2\n% comment line 3\nsome more text')

每行重复"""，并且与仅注释一行的情况发生冲突。

有什么方法可以计数找到正则表达式的连续行并相应地更改模式？

预先感谢您的帮助！

Answer 1

尽管使用正则表达式可能会实现此目的，但我认为如果没有一个则很容易。您可以例如使用itertools.groupby来检测连续的注释行组，只需使用str.startswith来检查行是否为注释。

text = """some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text"""

import itertools
for k, grp in itertools.groupby(text.splitlines(), key=lambda s: s.startswith("%")):
    if not k:
        for s in grp:
            print(s)
    else:
        grp = list(grp)
        if len(grp) == 1:
            print("# " + grp[0].lstrip("% "))
        else:
            print('"""')
            for s in grp:
                print(s.lstrip("% "))
            print('"""')

这只是打印结果文本，但是您当然也可以将其收集在某些字符串变量中并返回。如果注释也可以在一行的中间开始，则可以在if not k块中进行检查。此处使用re.sub例如区分%和\%。

Answer 2

直接：

with open('input.txt') as f:
    comments = []

    def reformat_comments(comments):
        if len(comments) == 1:
            comments_str = '#' + comments[0] + '\n'
        else:
            comments_str = '"""\n{}\n"""\n'.format('\n'.join(comments))
        return comments_str

    for line in f:
        line = line.strip()
        if line.startswith('% '):
            comments.append(line.lstrip('%'))
        elif comments:
            print(reformat_comments(comments) + line)
            comments = []
        else:
            print(line)
    if comments: print(reformat_comments(comments))

示例输出：

some text
"""
 comment line 1
 comment line 2
 comment line 3
"""
some more text
some text
# a single commented line
some more text

用注释替换注释，并根据注释的行数阻止注释，并在python中使用正则表达式

2 个答案: