用注释替换注释,并根据注释的行数阻止注释,并在python中使用正则表达式

时间:2019-06-27 10:53:18

标签: regex python-3.x

我想转换以下文本:

some text
% comment line 1
% comment line 2
% comment line 3
some more text

进入

some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text

在同一文件中,当仅注释一行时,我希望它来自

some text
% a single commented line
some more text

some text 
# a single commented line
some more text

因此,当两种情况都在同一文件中时,我想从以下内容开始:

some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text

some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text 
# a single commented line
some more text

到目前为止,我尝试的第二种情况是:

re.sub(r'(\A|\r|\n|\r\n|^)% ', r'\1# ',  'some text \n% a single comment line\nsome more text')

但是当注释多于一行时,它将%替换为#

对于第二种情况,我失败了:

re.sub(r'(\A|\r|\n|\r\n|^)(% )(.*)(?:\n^\t.*)*', r'"""\3"""',  'some text \n% comment line1\n% comment line 2\n% comment line 3\nsome more text') 

每行重复""",并且与仅注释一行的情况发生冲突。

有什么方法可以计数找到正则表达式的连续行并相应地更改模式?

预先感谢您的帮助!

2 个答案:

答案 0 :(得分:2)

尽管使用正则表达式可能会实现此目的,但我认为如果没有一个则很容易。您可以例如使用itertools.groupby来检测连续的注释行组,只需使用str.startswith来检查行是否为注释。

text = """some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text"""

import itertools
for k, grp in itertools.groupby(text.splitlines(), key=lambda s: s.startswith("%")):
    if not k:
        for s in grp:
            print(s)
    else:
        grp = list(grp)
        if len(grp) == 1:
            print("# " + grp[0].lstrip("% "))
        else:
            print('"""')
            for s in grp:
                print(s.lstrip("% "))
            print('"""')

这只是打印结果文本,但是您当然也可以将其收集在某些字符串变量中并返回。如果注释也可以在一行的中间开始,则可以在if not k块中进行检查。 此处使用re.sub例如区分%\%

答案 1 :(得分:1)

直接:

with open('input.txt') as f:
    comments = []

    def reformat_comments(comments):
        if len(comments) == 1:
            comments_str = '#' + comments[0] + '\n'
        else:
            comments_str = '"""\n{}\n"""\n'.format('\n'.join(comments))
        return comments_str

    for line in f:
        line = line.strip()
        if line.startswith('% '):
            comments.append(line.lstrip('%'))
        elif comments:
            print(reformat_comments(comments) + line)
            comments = []
        else:
            print(line)
    if comments: print(reformat_comments(comments))

示例输出:

some text
"""
 comment line 1
 comment line 2
 comment line 3
"""
some more text
some text
# a single commented line
some more text