如何用awk或其他linux程序指定文本限定符?
我的数据如下:
它实际上是制表符分隔符,但是某些字段中有一个制表符。字段由双引号限定。
如何指定字段不仅仅由制表符分隔,而且字段用引号分隔?
到目前为止这是我的脚本:
awk '{OF=OFS="\t"}{print $1,$7,$8,$10,$11,$21}' cyme.txt | grep -i pilates
另外,出于实际目的,我还包括数据样本的完美文本副本:
"723721093013" "AFL" "1" "" "15" "ALT ROCK...." "Hai!........................" "Creatures, The.............." 2 "N" 4 7.48 2004.02.17 0.0000 . . . . 2
"723721093112" "AFL" "1" "" "5" "ELECTRONIC.." "Crash And Burn.............." "Foxx, John/Gordon, Louis...." 1 "W" 4 11.98 2004.02.17 0.0000 . . . . 73
"819162013137" "AHY" "1" "" "101" "PUNK........" "Truth, Love and Liberty....." "FM359......................." 2 "H" 1 4.48 2014.01.14 0.0000 . . . . 39
"879198005148" "AHY" "1" "" "14" "PUNK........" "Re-Volts S/T................" "Re-Volts, The..............." 1 "J" 4 5.48 2007.12.11 0.0000 . . . . 10
"879198004288" "AHY" "1" "" "24" "PUNK........" "Read Between The Lines......" "Smalltown..................." 1 "N" 4 7.48 2009.12.01 0.0000 . . . . 17
如果有任何需要澄清,请告诉我。 如何使用awk或其他linux程序指定文本限定符?
我意识到,令人惊讶的是awk可能不是这项工作的正确工具,如果确实如此,我很高兴知道应该使用其他命令来处理带有字段限定符的文本文件。
答案 0 :(得分:0)
如果gawk
可用,请使用regex
作为字段分隔符:
> gawk '{for (i=1;i<=NF;i++){if ($i){printf("FN: %d Content: %s",i,$i)}}print "\n"}' FS='([\t]*?\"| +)' infile
FN: 2 Content: 723721093013FN: 5 Content: AFLFN: 8 Content: 1FN: 14 Content: 15FN: 17 Content: ALTFN: 18 Content: ROCK....FN: 21 Content: Hai!........................FN: 24 Content: Creatures,FN: 25 Content: The..............FN: 27 Content: 2FN: 29 Content: NFN: 31 Content: 4FN: 32 Content: 7.48FN: 33 Content: 2004.02.17FN: 34 Content: 0.0000FN: 35 Content: .FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: 2
FN: 2 Content: 723721093112FN: 5 Content: AFLFN: 8 Content: 1FN: 14 Content: 5FN: 17 Content: ELECTRONIC..FN: 20 Content: CrashFN: 21 Content: AndFN: 22 Content: Burn..............FN: 25 Content: Foxx,FN: 26 Content: John/Gordon,FN: 27 Content: Louis....FN: 29 Content: 1FN: 31 Content: WFN: 33 Content: 4FN: 34 Content: 11.98FN: 35 Content: 2004.02.17FN: 36 Content: 0.0000FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: .FN: 41 Content: 73
FN: 2 Content: 819162013137FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 101FN: 17 Content: PUNK........FN: 20 Content: Truth,FN: 21 Content: LoveFN: 22 Content: andFN: 23 Content: Liberty.....FN: 26 Content: FM359.......................FN: 28 Content: 2FN: 30 Content: HFN: 32 Content: 1FN: 33 Content: 4.48FN: 34 Content: 2014.01.14FN: 35 Content: 0.0000FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: 39
FN: 2 Content: 879198005148FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 14FN: 17 Content: PUNK........FN: 20 Content: Re-VoltsFN: 21 Content: S/T................FN: 24 Content: Re-Volts,FN: 25 Content: The...............FN: 27 Content: 1FN: 29 Content: JFN: 31 Content: 4FN: 32 Content: 5.48FN: 33 Content: 2007.12.11FN: 34 Content: 0.0000FN: 35 Content: .FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: 10
FN: 2 Content: 879198004288FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 24FN: 17 Content: PUNK........FN: 20 Content: ReadFN: 21 Content: BetweenFN: 22 Content: TheFN: 23 Content: Lines......FN: 26 Content: Smalltown...................FN: 28 Content: 1FN: 30 Content: NFN: 32 Content: 4FN: 33 Content: 7.48FN: 34 Content: 2009.12.01FN: 35 Content: 0.0000FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: 17