我有一个动态文本文件,该文件会自动写一些行。 但这与重复的条目有关:
例如:
1111 2222 3333 4444 <- I want this line
5555 6666 7777 8888 <- And this line too
1111 2222 3333 4444
5555 6666 7777 9999 <- Note : 9999 is only one ward change
预期结果:
1111 2222 3333 4444
5555 6666 7777 8888
真实测试前:
exten => 01272786170,1,Set(CALLERID(num)=821)
same => n,Dial(SIP/port21/01272786170,60,rt)
same => n,Set(thereis=yes01272786170)
same => n,Set(calledid=01272786170)
same => n,GotoIf("calledid" = "01272786170"?ejoin,01272786170,1)
exten => 01272786170,1,Set(CALLERID(num)=826) <- duplicated here with one number change
same => n,Dial(SIP/port26/01272786170,60,rt) <-
exten => 01272786170,1,Set(CALLERID(num)=827) <-
same => n,Dial(SIP/port27/01272786170,60,rt) <-
预期结果:
exten => 01272786170,1,Set(CALLERID(num)=821)
same => n,Dial(SIP/port21/01272786170,60,rt)
same => n,Set(thereis=yes01272786170)
same => n,Set(calledid=01272786170)
same => n,GotoIf("calledid" = "01272786170"?ejoin,01272786170,1)
注意:我希望使用Linux Shell来完成。
非常感谢您。
答案 0 :(得分:0)
使用awk和您的第一个示例数据:
如果您使用Levenshtein算法(例如here)并提出足够的编辑距离(以下为4),则可以使用以下简单方法:
awk '
function levdist(str1, str2 ...) # see the above link for working implementation
{
...
}
{
for(i in a) { # iterate all previous stored strings
l=levdist($0,a[i]) # compute the edit distance
if(l<=4) # if below threshold
next # skip to next string
}
print $0 # output where threshold was not met
a[NR]=$0 # store
}' file
输出:
1111 2222 3333 4444
5555 6666 7777 8888