Question

我想使用cons读取一个文件，但是我被困在第四个字段，它会在逗号后自动中断。

数据： - test.txt

awk

输出

"A","B","ls","This,is,the,test"
"k","O","mv","This,is,the,2nd test"
"C","J","cd","This,is,the,3rd test"

cat test.txt | awk -F , '{ OFS="|" ;print $2 $3 $4 }'

但输出应该是这样的

"B"|"ls"|"This
"O"|"mv"|"This
"J"|"cd"|"This

任何想法

Answer 1

使用awk，您还可以使用：

awk -F'\",\"' 'BEGIN{OFS="\"|\""}{print "\""$2,$3,$4}' filename

注意：只有在字符串之间找不到","时才会起作用。那就是它被用作字段分隔符。

输出：

"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

或

好一点：

awk -F'^\"|\",\"|\"$' 'BEGIN{OFS="\"|\""}{print "\""$3,$4,$5"\""}' filename

Answer 2

对FPAT使用GNU awk：

$ awk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS='|' '{print $2,$3,$4}' file
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

请参阅http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content

你可以做其他事情：

$ cat tst.awk
BEGIN { OFS="|" }
{
    nf=0
    delete f
    while ( match($0,/([^,]+)|(\"[^\"]+\")/) ) {
        f[++nf] = substr($0,RSTART,RLENGTH)
        $0 = substr($0,RSTART+RLENGTH)
    }
    print f[2], f[3], f[4]
}

$ awk -f tst.awk file
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

Answer 3

在awk：

awk -F'"' '{for(i=4;i<=9;i+=2) {if(i==4){s="\""$i"\""}else{s = s "|\"" $i"\""}}; print s}' test.txt

<强>解释

-F'"'表示以逗号分隔的字段

awk解释：

{
## use for-loop to go over fields
## skips the comma field (i.e. increment by +2)
## OP wanted to start at field 2, this means the 4th term
## OP wanted to end at field 4, this means the 8th term
for(i=4;i<=8;i+=2) {

    if(i==4){
        ## initialization
        ## use variable s to hold output (i.e. quoted first field $i)
        s="\"" $i "\""
    } else {
        ## for rest of field $i,
        ## prepend '|' and add quotes around $i
        s = s "|\"" $i "\""
    }
};

## print output
print s 
}

Answer 4

我不喜欢awk这项任务。我的建议是使用csv解析器，例如，python有一个内置模块来处理这个问题。您可以像使用它一样使用它：

import csv
import sys

with open(sys.argv[1], 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    csvwriter = csv.writer(sys.stdout, delimiter='|', quoting=csv.QUOTE_ALL)
    for row in csvreader:
        csvwriter.writerow(row[1:])

然后运行它：

python3 script.py infile

收益率为stdout：

"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

Answer 5

awk '{sub(/^..../,"")gsub(/","/,"\042""|""\042")}1' file

"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

如何处理文件使用awk

5 个答案: