在Linux中指定文本限定符和分隔符

时间:2014-03-17 17:33:25

标签: linux sed awk grep

如何用awk或其他linux程序指定文本限定符?

我的数据如下:

enter image description here

它实际上是制表符分隔符,但是某些字段中有一个制表符。字段由双引号限定。

如何指定字段不仅仅由制表符分隔,而且字段用引号分隔?

到目前为止这是我的脚本:

 awk '{OF=OFS="\t"}{print $1,$7,$8,$10,$11,$21}' cyme.txt | grep -i pilates

另外,出于实际目的,我还包括数据样本的完美文本副本:

"723721093013"  "AFL"   "1" ""  "15"    "ALT ROCK...."  "Hai!........................"  "Creatures, The.............."  2   "N" 4   7.48    2004.02.17  0.0000  .  .    .  .    2
"723721093112"  "AFL"   "1" ""  "5" "ELECTRONIC.."  "Crash And Burn.............."  "Foxx, John/Gordon, Louis...."  1   "W" 4   11.98   2004.02.17  0.0000  .  .    .  .    73
"819162013137"  "AHY"   "1" ""  "101"   "PUNK........"  "Truth, Love and Liberty....."  "FM359......................."  2   "H" 1   4.48    2014.01.14  0.0000  .  .    .  .    39
"879198005148"  "AHY"   "1" ""  "14"    "PUNK........"  "Re-Volts S/T................"  "Re-Volts, The..............."  1   "J" 4   5.48    2007.12.11  0.0000  .  .    .  .    10
"879198004288"  "AHY"   "1" ""  "24"    "PUNK........"  "Read Between The Lines......"  "Smalltown..................."  1   "N" 4   7.48    2009.12.01  0.0000  .  .    .  .    17

如果有任何需要澄清,请告诉我。 如何使用awk或其他linux程序指定文本限定符?

我意识到,令人惊讶的是awk可能不是这项工作的正确工具,如果确实如此,我很高兴知道应该使用其他命令来处理带有字段限定符的文本文件。

1 个答案:

答案 0 :(得分:0)

如果gawk可用,请使用regex作为字段分隔符:

> gawk '{for (i=1;i<=NF;i++){if ($i){printf("FN: %d Content: %s",i,$i)}}print "\n"}' FS='([\t]*?\"| +)' infile
FN: 2 Content: 723721093013FN: 5 Content: AFLFN: 8 Content: 1FN: 14 Content: 15FN: 17 Content: ALTFN: 18 Content: ROCK....FN: 21 Content: Hai!........................FN: 24 Content: Creatures,FN: 25 Content: The..............FN: 27 Content: 2FN: 29 Content: NFN: 31 Content: 4FN: 32 Content: 7.48FN: 33 Content: 2004.02.17FN: 34 Content: 0.0000FN: 35 Content: .FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: 2

FN: 2 Content: 723721093112FN: 5 Content: AFLFN: 8 Content: 1FN: 14 Content: 5FN: 17 Content: ELECTRONIC..FN: 20 Content: CrashFN: 21 Content: AndFN: 22 Content: Burn..............FN: 25 Content: Foxx,FN: 26 Content: John/Gordon,FN: 27 Content: Louis....FN: 29 Content: 1FN: 31 Content: WFN: 33 Content: 4FN: 34 Content: 11.98FN: 35 Content: 2004.02.17FN: 36 Content: 0.0000FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: .FN: 41 Content: 73

FN: 2 Content: 819162013137FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 101FN: 17 Content: PUNK........FN: 20 Content: Truth,FN: 21 Content: LoveFN: 22 Content: andFN: 23 Content: Liberty.....FN: 26 Content: FM359.......................FN: 28 Content: 2FN: 30 Content: HFN: 32 Content: 1FN: 33 Content: 4.48FN: 34 Content: 2014.01.14FN: 35 Content: 0.0000FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: 39

FN: 2 Content: 879198005148FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 14FN: 17 Content: PUNK........FN: 20 Content: Re-VoltsFN: 21 Content: S/T................FN: 24 Content: Re-Volts,FN: 25 Content: The...............FN: 27 Content: 1FN: 29 Content: JFN: 31 Content: 4FN: 32 Content: 5.48FN: 33 Content: 2007.12.11FN: 34 Content: 0.0000FN: 35 Content: .FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: 10

FN: 2 Content: 879198004288FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 24FN: 17 Content: PUNK........FN: 20 Content: ReadFN: 21 Content: BetweenFN: 22 Content: TheFN: 23 Content: Lines......FN: 26 Content: Smalltown...................FN: 28 Content: 1FN: 30 Content: NFN: 32 Content: 4FN: 33 Content: 7.48FN: 34 Content: 2009.12.01FN: 35 Content: 0.0000FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: 17