输入文件:
SNO|PRODUCT|SUMMARY | ADDRESS|DATE|
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111111|This cutomer contact is 223 456 7777|Adress1|01/01/2012|
101|111111|This cutomer is details 4450 2214 2254 2133|Adress1|01/01/2012|
101|111111|This cutomer is phone number is 223 124 4411|Adress1|01/01/2012|
101|111111|This cutomer is card 1245-2355-4452-1214-152|Adress1|01/01/2012|
101|111111|This cutomer is credit number 1245 2355 4452 1214 152|Adress1|01/01/2012|
预期的output_file:
SNO|PRODUCT|SUMMARY | ADDRESS|DATE|
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111131|This cutomer contact is 223 456 7777|Adress1|01/01/2012|
101|111141|This cutomer is phone number is 223 124 4411|Adress1|01/01/2012|
删除包含卡片详细信息的记录。这可以出现在摘要列的任何位置。
要检查的条件是:
我尝试了这个逻辑:
首先将数字分组为
sed 's/\([0-9]\) \([0-9]\)/\1\2/g' input_file
- 删除数字中的空格sed 's/\([0-9]\)-\([0-9]\)/\1\2/g' input_file
- 删除数字之间的连字符结果:已达到
的结果SNO|PRODUCT|SUMMARY | ADDRESS|DATE|
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111111|This cutomer contact is 2234567777|Adress1|01/01/2012|
101|111111|This cutomer is details 445022142254 2133|Adress1|01/01/2012|
101|111111|This cutomer is phone number is 2231244411|Adress1|01/01/2012|
101|111111|This cutomer is card 1245235544521214152|Adress1|01/01/2012|
101|111111|This cutomer is credit number 1245235544521214152|Adress1|01/01/2012|
现在我需要删除数字大于9,99,99,99,999的行,以便只删除带有卡号的行。
无法执行大于“摘要”列中的检查。
对此有任何帮助吗?
答案 0 :(得分:2)
我建议使用awk而不是sed,因为它可以更容易地将过程分成几个步骤。这是一个产生所需输出的awk脚本:
# set field separator to |
BEGIN { FS = "|" }
{
# save third field
summary = $3
# remove everything not a digit from the start
sub(/^[^0-9]+/, "", summary)
# remove hyphens and spaces from what is left
gsub(/[- ]/, "", summary)
# print the whole record unless the number is too long
if (length(summary) <= 10) print;
}
测试出来:
$ cat file
SNO|PRODUCT|SUMMARY | ADDRESS|DATE|
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111111|This cutomer contact is 223 456 7777|Adress1|01/01/2012|
101|111111|This cutomer is details 4450 2214 2254 2133|Adress1|01/01/2012|
101|111111|This cutomer is phone number is 223 124 4411|Adress1|01/01/2012|
101|111111|This cutomer is card 1245-2355-4452-1214-152|Adress1|01/01/2012|
101|111111|This cutomer is credit number 1245 2355 4452 1214 152|Adress1|01/01/2012|
$ awk -f script.awk file
SNO|PRODUCT|SUMMARY | ADDRESS|DATE|
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111111|This cutomer contact is 223 456 7777|Adress1|01/01/2012|
101|111111|This cutomer is phone number is 223 124 4411|Adress1|01/01/2012|
顺便说一下,它是客户,而不是cutomer:)
答案 1 :(得分:1)
$ cat tst.awk
BEGIN { FS="|" }
{
maxLgth = 0
tail = $0
while ( match(tail,/[0-9]([0-9]|[^[:alpha:]])+[0-9]/) ) {
cur = substr(tail,RSTART,RLENGTH)
gsub(/[^0-9]/,"",cur)
curLgth = length(cur)
maxLgth = (curLgth > maxLgth ? curLgth : maxLgth)
tail = substr(tail,RSTART+RLENGTH)
}
}
maxLgth > 10
$ awk -f tst.awk file
101|111111|This cutomer is details 4450 2214 2254 2133|Adress1|01/01/2012|
101|111111|This cutomer is card 1245-2355-4452-1214-152|Adress1|01/01/2012|
101|111111|This cutomer is credit number 1245 2355 4452 1214 152|Adress1|01/01/2012|
不清楚你的意思是什么&#34;特殊字符&#34;所以我假设上面你的意思是非字母字符。