如何从csv文件中的第四列中删除第三列的值(如果存在)?

时间:2019-05-01 12:24:01

标签: bash awk sed

Ubuntu 16.04 重击4.3.48

我想从第4列中删除第3列的值,如果该值包括该值后面的空格。

    static  int numShortestLinks(int r, int c) {

    if ((r==0) || (c==0)) {
        return 0;
    }

    if ((r==1) || (c==1)) {
        return 1;
    }

    return numShortestLinks(r-1,c-1) + numShortestLinks(r-1,c);
}


int mat[][] = new int [4][2];

System.out.println(numShortestLinks(mat.length,mat[0].length));

我尝试按照建议使用awk:

Before: "Acura","CL","2.2","2.2 2dr Coupe","FWD","Automatic","Gasoline"
After:  "Acura","CL","2.2","2dr Coupe","FWD","Automatic","Gasoline"

Before: "Acura","CL","2.2 Premium","2.2 Premium 2dr Coupe","FWD","Manual","Gasoline"
After:  "Acura","CL","2.2 Premium","2dr Coupe","FWD","Manual","Gasoline"   

我正确地重定向输出还是应该重组命令?

2 个答案:

答案 0 :(得分:3)

在您的代码中,您使用,作为分隔符,但您的字段实际上由","分隔,因此只需更改FS和OFS设置以匹配您的数据即可:

$ awk 'BEGIN{FS=OFS="\",\""} {sub($3,"",$4)} 1' file
"Acura","CL","2.2"," 2dr Coupe","FWD","Automatic","Gasoline"
"Acura","CL","2.2 Premium"," 2dr Coupe","FWD","Manual","Gasoline"

要消除$ 4开头剩余的空格,请在正则表达式中包含空格:

$ awk 'BEGIN{FS=OFS="\",\""} {sub($3" *","",$4)} 1' file
"Acura","CL","2.2","2dr Coupe","FWD","Automatic","Gasoline"
"Acura","CL","2.2 Premium","2dr Coupe","FWD","Manual","Gasoline"

由于使用$ 3作为正则表达式,因此它并不健壮,因此像.这样的RE元字符将被这样处理:

$ echo '"Acura","CL","2.2","Big 12324 Coupe","FWD","Automatic","Gasoline"' |
    awk 'BEGIN{FS=OFS="\",\""} {sub($3,"",$4)} 1'
"Acura","CL","2.2","Big 14 Coupe","FWD","Automatic","Gasoline"

要使此功能稳定运行,您实际上应该执行字符串操作而不是进行正则表达式操作:

$ awk 'BEGIN{FS=OFS="\",\""} s=index($4,$3){$4=substr($4,1,s-1) substr($4,s+length($3)); gsub(/ +/," ",$4); gsub(/^ | $/,"",$4)} 1' file
"Acura","CL","2.2","2dr Coupe","FWD","Automatic","Gasoline"
"Acura","CL","2.2 Premium","2dr Coupe","FWD","Manual","Gasoline"

如果您只想删除在$ 4开头出现的$ 3,则只需将s=index($4,$3)更改为(s=index($4,$3))==1

答案 1 :(得分:2)

请您尝试以下操作(仅针对显示的示例进行编写和测试)。

awk 'BEGIN{FS=OFS=","} {val=$3;gsub(/\"/,"",val);sub(val,"",$4);sub(/^" /,"\"",$4)} 1' Input_file