我有一个列表偶尔会有一个字符串,当不匹配重复序列时需要替换它。我认为SED最适合这项任务,但我对其他人开放。
CHAPTERS START_TIME END_TIME DURATION OTHER_COMMENTS
Chapter_1 0:00 ..............
Chapter_6
999_8
Chapter_58
Chapter_63
as;li9c
Chapter_121
CHAPTERS START_TIME END_TIME DURATION OTHER_COMMENTS
Chapter_1 0:00 ..............
Chapter_6
Chapter_11
Chapter_58
Chapter_63
Chapter_68
Chapter_121
999_8 Chapter_11
as;li9c Chapter_68
CHAPTERS START_TIME END_TIME DURATION OTHER_COMMENTS
Chapter_00001 0:00 1:16 1:16
Chapter_00006 5:15 6:49 1:34
999_8 9:17 11:17 2:00
Chapter_00058 19:51 20:52 1:01
Chapter_00063 23:01 23:57 0:56
as;li9c 27:42 29:45 2:03
Chapter_00121 64:33 66:33 2:00
CHAPTERS START_TIME END_TIME DURATION OTHER_COMMENTS
Chapter_00001 0:00 1:16 1:16
Chapter_00006 5:15 6:49 1:34
Chapter_00011 9:17 11:17 2:00
Chapter_00058 19:51 20:52 1:01
Chapter_00063 23:01 23:57 0:56
Chapter_00068 27:42 29:45 2:03
Chapter_00121 64:33 66:33 2:00
999_8 Chapter_00011
as;li9c Chapter_00068
答案 0 :(得分:3)
sed
没有算术设施(虽然这不是不可能的)。 awk
更适合这项任务。以下假定制表符分隔符和标题行。它还在章节编号中保留了前导0
。
awk '
BEGIN { OFS = "\t" }
NR == 1 { print; next }
$1 ~ /^Chapter_/ { n = substr($1, 9); print; next }
{
repl = sprintf("Chapter_%0*d", length(n), (n + 5))
print $1, repl >"replaced.txt"
print repl, substr($0, index($0, "\t") + 1)
}
' input.txt > output.txt
答案 1 :(得分:0)
import re
f=open("input.txt",'r')
z=[]
for part in f:
if len(z)!=0:
pattern=re.compile(r"^Chapter_(\d+).*$")
try:
k=pattern.match(part).groups()[0]
z.append(part)
except:
rep=int(k)+5
z.append("Chapter_"+str(rep)+"\n")
else:
z.append(part)
f.close()
f=open("output.txt",'w')
for line in z:
f.write(line)
f.close()
这是在python中。问题类型更适合它。