用新值替换字符串

时间:2017-02-13 20:29:46

标签: unix awk

我试图操纵一个巨大的文件(+5.000.000条记录),所以我可以替换第8列的值。例如。

If $8 = 1 replace it with success
if $8 = 2 replace it with check
if $8 = null replace with undefined

这是一段以,字符分隔的数据:

"APPLICATION_ID","ORIGIN_ID","SERVICE_ID","PROVIDER_ID","RATING_ID","ATO","DATE","USER_TYPE","ESTATUS","OPERATION_ID"

"3","2","424","5020","1058","3017292917","30/11/2016 01:14:25 a.m.","1","2004","14804862360104011458"

我要替换的字段是位于USER_TYPE

$8

我试过这个,但它并没有取代这些值:

awk '{if($8 = 1) print $1, $2, $3, $4, $5, $6, $7, "success", $9, $10}' input_file

我怎样才能完成这项工作?

3 个答案:

答案 0 :(得分:1)

@sandatomo:尝试(未经测试):

awk -F, -vs1="\"" 'NR>1{gsub(/\"/,"",$8);if($8==1){sub(/.*/,s1 "success" s1,$8)};if($8==2){sub(/.*/,s1 "check" s1,$8)};if($8=="null"){sub(/.*/,s1 "undefined" s1,$8)};print}' OFS=, Input_file

编辑:现在也添加非单行形式的解决方案。

awk -F, -vs1="\"" 'NR>1{
                                gsub(/\"/,"",$8);
                                if($8==1){
                                                sub(/.*/,s1 "success" s1,$8)
                                         };
                                if($8==2){
                                                sub(/.*/,s1 "check" s1,$8)
                                         };
                                if($8=="null"){
                                                sub(/.*/,s1 "undefined" s1,$8)
                                              };
                                print
                       }
                  ' OFS=,  Input_file

EDIT2:我测试了我的代码以前的代码,它没有字段分隔符作为","所以现在就编辑了。

EDIT3:上述说明。

awk  -F, -vs1="\"" 'NR>1{                                  ##### Setting Field separator as comma(,). Creating a variable named s1 whose value is a quote("). Then Checking here if current line number is greater than 1.
                                                           ##### If above condition is TRUE then all following statements will be executing.
                gsub(/\"/,"",$8);                          ##### substituting all quotes(") in $8 now.
        if($8==1){                                 ##### Check if 8th field value is 1, if yes then it will execute following statement.
                sub(/.*/,s1 "success" s1,$8)     ##### substitute everything in $8 with  s1 "success" s1
                 };
                if($8==2){                                 ##### Similarly like above checking if $8's value is 2
                sub(/.*/,s1 "check" s1,$8)       ##### Then substitute the $8's value with s1 "check" s1
                 };  
                if($8=="null"){                            ##### checking if $8's value is "null" here
                sub(/.*/,s1 "undefined" s1,$8)   ##### substituting the complete value of $8 with s1 "undefined" s1.
                      };
        print                                      ##### printing the whole line now.
         }
    '   OFS=,  Input_file                                  ##### Setting output field separator as a comma. Then mentioning the Input_file here.

答案 1 :(得分:0)

您可以尝试这样的事情:

awk  'BEGIN {OFS=FS=",";r["\"\""] = "\"undefined\""; r["\"1\""]= "\"success\""; r["\"2\""]="\"check\""} {if($8 in r ) $8 = r[$8]} 1' input_file

<强>解释

  • BEGIN部分在r中设置了替换映射。例如r["\"1\""]= "\"success\"";是文字标记"1"(带有qoutes的1!)到字面值"success"(也包括引号!)的地图。
  • 另外FSOFS设置为使用逗号作为BEGIN部分中的输入和输出分隔符
  • r定义后的部分包含测试,如果字段$8的值是地图中的关键字,如果是,则字段$8将替换为地图r中为此键定义的值
  • 如果列$8中有未映射的值,则问题不是100%清除,因此请将此作为您自己实验的起点

答案 2 :(得分:0)

这是一个较短的单行:

$ awk 'BEGIN{FS=OFS=",";a[1]="success";a[2]="check"} {gsub(/"/,"",$8)} $8 in a{$8=a[$8]} 1' input.txt

爆发评论:

BEGIN {
  FS=OFS=","       # set our field separators
  a[1]="success"   # populate an array with replacement values
  a[2]="check"
}

{
  gsub(/"/,"",$8)   # remove quotes in field 8, for easier processing
}

$8 in a {           # check to see if field 8 is a member of our array
  $8=a[$8]          # replace field 8 with the contents of the array at that index
}

1                   # print the line

如果在每个字段周围保留引号非常重要,您可以通过将分配替换为包含它们的sprintf()来实现:

  $8=sprintf("\"%s\"",a[$8])

请记住,awk只知道你的字段分隔符,而不是你的引号。如果您在引用字段中包含逗号字段,则awk会将其视为字段分隔符。您可以在awk脚本的顶部为此类事件添加保护:

NF != 10 { print "ERROR: wrong number of fields in line",NR > "/dev/stderr"; exit(1) }