我想转换以下文本:
some text
% comment line 1
% comment line 2
% comment line 3
some more text
进入
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
在同一文件中,当仅注释一行时,我希望它来自
some text
% a single commented line
some more text
到
some text
# a single commented line
some more text
因此,当两种情况都在同一文件中时,我想从以下内容开始:
some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text
到
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text
# a single commented line
some more text
到目前为止,我尝试的第二种情况是:
re.sub(r'(\A|\r|\n|\r\n|^)% ', r'\1# ', 'some text \n% a single comment line\nsome more text')
但是当注释多于一行时,它将%
替换为#
。
对于第二种情况,我失败了:
re.sub(r'(\A|\r|\n|\r\n|^)(% )(.*)(?:\n^\t.*)*', r'"""\3"""', 'some text \n% comment line1\n% comment line 2\n% comment line 3\nsome more text')
每行重复"""
,并且与仅注释一行的情况发生冲突。
有什么方法可以计数找到正则表达式的连续行并相应地更改模式?
预先感谢您的帮助!
答案 0 :(得分:2)
尽管使用正则表达式可能会实现此目的,但我认为如果没有一个则很容易。您可以例如使用itertools.groupby
来检测连续的注释行组,只需使用str.startswith
来检查行是否为注释。
text = """some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text"""
import itertools
for k, grp in itertools.groupby(text.splitlines(), key=lambda s: s.startswith("%")):
if not k:
for s in grp:
print(s)
else:
grp = list(grp)
if len(grp) == 1:
print("# " + grp[0].lstrip("% "))
else:
print('"""')
for s in grp:
print(s.lstrip("% "))
print('"""')
这只是打印结果文本,但是您当然也可以将其收集在某些字符串变量中并返回。如果注释也可以在一行的中间开始,则可以在if not k
块中进行检查。 此处使用re.sub
例如区分%
和\%
。
答案 1 :(得分:1)
直接:
with open('input.txt') as f:
comments = []
def reformat_comments(comments):
if len(comments) == 1:
comments_str = '#' + comments[0] + '\n'
else:
comments_str = '"""\n{}\n"""\n'.format('\n'.join(comments))
return comments_str
for line in f:
line = line.strip()
if line.startswith('% '):
comments.append(line.lstrip('%'))
elif comments:
print(reformat_comments(comments) + line)
comments = []
else:
print(line)
if comments: print(reformat_comments(comments))
示例输出:
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text
# a single commented line
some more text