仅删除包含卡片详细信息的行

时间:2016-01-14 06:43:23

标签: linux unix awk sed grep

输入文件:

SNO|PRODUCT|SUMMARY              | ADDRESS|DATE|  
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111111|This cutomer contact is 223 456 7777|Adress1|01/01/2012|
101|111111|This cutomer is details 4450 2214 2254 2133|Adress1|01/01/2012|
101|111111|This cutomer is phone number is 223 124 4411|Adress1|01/01/2012|
101|111111|This cutomer is card 1245-2355-4452-1214-152|Adress1|01/01/2012|
101|111111|This cutomer is credit number 1245 2355 4452 1214 152|Adress1|01/01/2012|

预期的output_file:

SNO|PRODUCT|SUMMARY              | ADDRESS|DATE|
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111131|This cutomer contact is 223 456 7777|Adress1|01/01/2012|
101|111141|This cutomer is phone number is 223 124 4411|Adress1|01/01/2012|

删除包含卡片详细信息的记录。这可以出现在摘要列的任何位置。

要检查的条件是:

  1. 电话号码是10位数,因此不能删除。
  2. 要删除的连续数字超过10位
  3. 需要删除中间空格或连字符或其他特殊字符的连续数字超过10位数。
  4. 我尝试了这个逻辑:

    首先将数字分组为

    1. sed 's/\([0-9]\) \([0-9]\)/\1\2/g' input_file - 删除数字中的空格
    2. sed 's/\([0-9]\)-\([0-9]\)/\1\2/g' input_file - 删除数字之间的连字符
    3. 结果:已达到

      的结果
      SNO|PRODUCT|SUMMARY              | ADDRESS|DATE|     
      101|111111|This cutomer is good|Adress1|01/01/2012|
      101|111111|This cutomer contact is 2234567777|Adress1|01/01/2012|
      101|111111|This cutomer is details 445022142254 2133|Adress1|01/01/2012|
      101|111111|This cutomer is phone number is 2231244411|Adress1|01/01/2012|
      101|111111|This cutomer is card 1245235544521214152|Adress1|01/01/2012|
      101|111111|This cutomer is credit number 1245235544521214152|Adress1|01/01/2012|
      

      现在我需要删除数字大于9,99,99,99,999的行,以便只删除带有卡号的行。

      无法执行大于“摘要”列中的检查。

      对此有任何帮助吗?

2 个答案:

答案 0 :(得分:2)

我建议使用awk而不是sed,因为它可以更容易地将过程分成几个步骤。这是一个产生所需输出的awk脚本:

# set field separator to |
BEGIN { FS = "|" }

{
    # save third field
    summary = $3
    # remove everything not a digit from the start
    sub(/^[^0-9]+/, "", summary)
    # remove hyphens and spaces from what is left
    gsub(/[- ]/, "", summary)
    # print the whole record unless the number is too long
    if (length(summary) <= 10) print;
}

测试出来:

$ cat file
SNO|PRODUCT|SUMMARY              | ADDRESS|DATE|  
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111111|This cutomer contact is 223 456 7777|Adress1|01/01/2012|
101|111111|This cutomer is details 4450 2214 2254 2133|Adress1|01/01/2012|
101|111111|This cutomer is phone number is 223 124 4411|Adress1|01/01/2012|
101|111111|This cutomer is card 1245-2355-4452-1214-152|Adress1|01/01/2012|
101|111111|This cutomer is credit number 1245 2355 4452 1214 152|Adress1|01/01/2012|
$ awk -f script.awk file
SNO|PRODUCT|SUMMARY              | ADDRESS|DATE|  
101|111111|This cutomer is good|Adress1|01/01/2012|
101|111111|This cutomer contact is 223 456 7777|Adress1|01/01/2012|
101|111111|This cutomer is phone number is 223 124 4411|Adress1|01/01/2012|

顺便说一下,它是客户,而不是cutomer:)

答案 1 :(得分:1)

$ cat tst.awk
BEGIN { FS="|" }
{
    maxLgth = 0
    tail = $0
    while ( match(tail,/[0-9]([0-9]|[^[:alpha:]])+[0-9]/) ) {
        cur = substr(tail,RSTART,RLENGTH)
        gsub(/[^0-9]/,"",cur)
        curLgth = length(cur)
        maxLgth = (curLgth > maxLgth ? curLgth : maxLgth)
        tail = substr(tail,RSTART+RLENGTH)
    }
}
maxLgth > 10

$ awk -f tst.awk file
101|111111|This cutomer is details 4450 2214 2254 2133|Adress1|01/01/2012|
101|111111|This cutomer is card 1245-2355-4452-1214-152|Adress1|01/01/2012|
101|111111|This cutomer is credit number 1245 2355 4452 1214 152|Adress1|01/01/2012|

不清楚你的意思是什么&#34;特殊字符&#34;所以我假设上面你的意思是非字母字符。