Question

我试图操纵一个巨大的文件（+5.000.000条记录），所以我可以替换第8列的值。例如。

If $8 = 1 replace it with success
if $8 = 2 replace it with check
if $8 = null replace with undefined

这是一段以,字符分隔的数据：

"APPLICATION_ID","ORIGIN_ID","SERVICE_ID","PROVIDER_ID","RATING_ID","ATO","DATE","USER_TYPE","ESTATUS","OPERATION_ID"

"3","2","424","5020","1058","3017292917","30/11/2016 01:14:25 a.m.","1","2004","14804862360104011458"

我要替换的字段是位于USER_TYPE

的$8

我试过这个，但它并没有取代这些值：

awk '{if($8 = 1) print $1, $2, $3, $4, $5, $6, $7, "success", $9, $10}' input_file

我怎样才能完成这项工作？

Answer 1

@sandatomo：尝试（未经测试）：

awk -F, -vs1="\"" 'NR>1{gsub(/\"/,"",$8);if($8==1){sub(/.*/,s1 "success" s1,$8)};if($8==2){sub(/.*/,s1 "check" s1,$8)};if($8=="null"){sub(/.*/,s1 "undefined" s1,$8)};print}' OFS=, Input_file

编辑：现在也添加非单行形式的解决方案。

awk -F, -vs1="\"" 'NR>1{
                                gsub(/\"/,"",$8);
                                if($8==1){
                                                sub(/.*/,s1 "success" s1,$8)
                                         };
                                if($8==2){
                                                sub(/.*/,s1 "check" s1,$8)
                                         };
                                if($8=="null"){
                                                sub(/.*/,s1 "undefined" s1,$8)
                                              };
                                print
                       }
                  ' OFS=,  Input_file

EDIT2：我测试了我的代码以前的代码，它没有字段分隔符作为＆＃34;，＆＃34;所以现在就编辑了。

EDIT3：上述说明。

awk  -F, -vs1="\"" 'NR>1{                                  ##### Setting Field separator as comma(,). Creating a variable named s1 whose value is a quote("). Then Checking here if current line number is greater than 1.
                                                           ##### If above condition is TRUE then all following statements will be executing.
                gsub(/\"/,"",$8);                          ##### substituting all quotes(") in $8 now.
        if($8==1){                                 ##### Check if 8th field value is 1, if yes then it will execute following statement.
                sub(/.*/,s1 "success" s1,$8)     ##### substitute everything in $8 with  s1 "success" s1
                 };
                if($8==2){                                 ##### Similarly like above checking if $8's value is 2
                sub(/.*/,s1 "check" s1,$8)       ##### Then substitute the $8's value with s1 "check" s1
                 };  
                if($8=="null"){                            ##### checking if $8's value is "null" here
                sub(/.*/,s1 "undefined" s1,$8)   ##### substituting the complete value of $8 with s1 "undefined" s1.
                      };
        print                                      ##### printing the whole line now.
         }
    '   OFS=,  Input_file                                  ##### Setting output field separator as a comma. Then mentioning the Input_file here.

Answer 2

您可以尝试这样的事情：

awk  'BEGIN {OFS=FS=",";r["\"\""] = "\"undefined\""; r["\"1\""]= "\"success\""; r["\"2\""]="\"check\""} {if($8 in r ) $8 = r[$8]} 1' input_file

<强>解释

BEGIN部分在r中设置了替换映射。例如r["\"1\""]= "\"success\"";是文字标记"1"（带有qoutes的1！）到字面值"success"（也包括引号！）的地图。
另外FS和OFS设置为使用逗号作为BEGIN部分中的输入和输出分隔符
r定义后的部分包含测试，如果字段$8的值是地图中的关键字，如果是，则字段$8将替换为地图r中为此键定义的值
如果列$8中有未映射的值，则问题不是100％清除，因此请将此作为您自己实验的起点

Answer 3

这是一个较短的单行：

$ awk 'BEGIN{FS=OFS=",";a[1]="success";a[2]="check"} {gsub(/"/,"",$8)} $8 in a{$8=a[$8]} 1' input.txt

爆发评论：

BEGIN {
  FS=OFS=","       # set our field separators
  a[1]="success"   # populate an array with replacement values
  a[2]="check"
}

{
  gsub(/"/,"",$8)   # remove quotes in field 8, for easier processing
}

$8 in a {           # check to see if field 8 is a member of our array
  $8=a[$8]          # replace field 8 with the contents of the array at that index
}

1                   # print the line

如果在每个字段周围保留引号非常重要，您可以通过将分配替换为包含它们的sprintf()来实现：

  $8=sprintf("\"%s\"",a[$8])

请记住，awk只知道你的字段分隔符，而不是你的引号。如果您在引用字段中包含逗号字段，则awk会将其视为字段分隔符。您可以在awk脚本的顶部为此类事件添加保护：

NF != 10 { print "ERROR: wrong number of fields in line",NR > "/dev/stderr"; exit(1) }

用新值替换字符串

3 个答案: