我试图操纵一个巨大的文件(+5.000.000条记录),所以我可以替换第8列的值。例如。
If $8 = 1 replace it with success
if $8 = 2 replace it with check
if $8 = null replace with undefined
这是一段以,
字符分隔的数据:
"APPLICATION_ID","ORIGIN_ID","SERVICE_ID","PROVIDER_ID","RATING_ID","ATO","DATE","USER_TYPE","ESTATUS","OPERATION_ID"
"3","2","424","5020","1058","3017292917","30/11/2016 01:14:25 a.m.","1","2004","14804862360104011458"
我要替换的字段是位于USER_TYPE
$8
我试过这个,但它并没有取代这些值:
awk '{if($8 = 1) print $1, $2, $3, $4, $5, $6, $7, "success", $9, $10}' input_file
我怎样才能完成这项工作?
答案 0 :(得分:1)
@sandatomo:尝试(未经测试):
awk -F, -vs1="\"" 'NR>1{gsub(/\"/,"",$8);if($8==1){sub(/.*/,s1 "success" s1,$8)};if($8==2){sub(/.*/,s1 "check" s1,$8)};if($8=="null"){sub(/.*/,s1 "undefined" s1,$8)};print}' OFS=, Input_file
编辑:现在也添加非单行形式的解决方案。
awk -F, -vs1="\"" 'NR>1{
gsub(/\"/,"",$8);
if($8==1){
sub(/.*/,s1 "success" s1,$8)
};
if($8==2){
sub(/.*/,s1 "check" s1,$8)
};
if($8=="null"){
sub(/.*/,s1 "undefined" s1,$8)
};
print
}
' OFS=, Input_file
EDIT2:我测试了我的代码以前的代码,它没有字段分隔符作为","所以现在就编辑了。
EDIT3:上述说明。
awk -F, -vs1="\"" 'NR>1{ ##### Setting Field separator as comma(,). Creating a variable named s1 whose value is a quote("). Then Checking here if current line number is greater than 1.
##### If above condition is TRUE then all following statements will be executing.
gsub(/\"/,"",$8); ##### substituting all quotes(") in $8 now.
if($8==1){ ##### Check if 8th field value is 1, if yes then it will execute following statement.
sub(/.*/,s1 "success" s1,$8) ##### substitute everything in $8 with s1 "success" s1
};
if($8==2){ ##### Similarly like above checking if $8's value is 2
sub(/.*/,s1 "check" s1,$8) ##### Then substitute the $8's value with s1 "check" s1
};
if($8=="null"){ ##### checking if $8's value is "null" here
sub(/.*/,s1 "undefined" s1,$8) ##### substituting the complete value of $8 with s1 "undefined" s1.
};
print ##### printing the whole line now.
}
' OFS=, Input_file ##### Setting output field separator as a comma. Then mentioning the Input_file here.
答案 1 :(得分:0)
您可以尝试这样的事情:
awk 'BEGIN {OFS=FS=",";r["\"\""] = "\"undefined\""; r["\"1\""]= "\"success\""; r["\"2\""]="\"check\""} {if($8 in r ) $8 = r[$8]} 1' input_file
<强>解释强>
BEGIN
部分在r
中设置了替换映射。例如r["\"1\""]= "\"success\"";
是文字标记"1"
(带有qoutes的1!)到字面值"success"
(也包括引号!)的地图。FS
和OFS
设置为使用逗号作为BEGIN
部分中的输入和输出分隔符r
定义后的部分包含测试,如果字段$8
的值是地图中的关键字,如果是,则字段$8
将替换为地图r
中为此键定义的值$8
中有未映射的值,则问题不是100%清除,因此请将此作为您自己实验的起点答案 2 :(得分:0)
这是一个较短的单行:
$ awk 'BEGIN{FS=OFS=",";a[1]="success";a[2]="check"} {gsub(/"/,"",$8)} $8 in a{$8=a[$8]} 1' input.txt
爆发评论:
BEGIN {
FS=OFS="," # set our field separators
a[1]="success" # populate an array with replacement values
a[2]="check"
}
{
gsub(/"/,"",$8) # remove quotes in field 8, for easier processing
}
$8 in a { # check to see if field 8 is a member of our array
$8=a[$8] # replace field 8 with the contents of the array at that index
}
1 # print the line
如果在每个字段周围保留引号非常重要,您可以通过将分配替换为包含它们的sprintf()
来实现:
$8=sprintf("\"%s\"",a[$8])
请记住,awk只知道你的字段分隔符,而不是你的引号。如果您在引用字段中包含逗号字段,则awk会将其视为字段分隔符。您可以在awk脚本的顶部为此类事件添加保护:
NF != 10 { print "ERROR: wrong number of fields in line",NR > "/dev/stderr"; exit(1) }