Question

我有一个这样的输入文件：

COL1: VALUE1 , XYZ: 2, OWNER: (DSF) , FLG: DIT /-/-/ OX if 0X, proc=0xyyy23, NAME=AUDIT
COL1: VALUE2 , XYZ: 2, OWNER: (DSF) , FLG: DIT /-/-/ OX if 0X, proc=0xyy23, NAME=generic
XYZ:2, COL1: 289 , TREK:MRP, OWNER: (DSF) , FLG: DIT /-/-/ OX if 0X,  NAME=Oil, trial=TREE

我想要这样的输出：

  COL1: VALUE1 , NAME=AUDIT
  COL1: VALUE2 , NAME=generic
  COL1: 289    , NAME=Oil

如何在命令行上使用awk/grep/sed而不使用awk，gawk等任何高级版本的nawk来实现此目的？

基本上我想获得COL1（即:和=之后的文本）和NAME的值，而不管它们在行中的什么位置。
看到“ NAME”列的位置稍有改变。

这是我能想到的：

awk -F"," '{print $1, $6}' file.txt
COL1: VALUE1   NAME=AUDIT
COL1: VALUE2   NAME=generic
XYZ:2   NAME=Oil

Answer 1

您可以尝试Perl单线版

 perl -lne ' /(COL1:\s*\S+).+(NAME=\w+)/ and print "$1,\t$2" ' input_file

使用您的输入：

$ cat sach.txt
COL1: VALUE1 , XYZ: 2, OWNER: (DSF) , FLG: DIT /-/-/ OX if 0X, proc=0xyyy23, NAME=AUDIT
COL1: VALUE2 , XYZ: 2, OWNER: (DSF) , FLG: DIT /-/-/ OX if 0X, proc=0xyy23, NAME=generic
XYZ:2, COL1: 289 , TREK:MRP, OWNER: (DSF) , FLG: DIT /-/-/ OX if 0X,  NAME=Oil, trial=TREE
$ perl -lne ' /(COL1:\s*\S+).+(NAME=\w+)/ and print "$1,\t$2" ' sach.txt
COL1: VALUE1,   NAME=AUDIT
COL1: VALUE2,   NAME=generic
COL1: 289,      NAME=Oil
$

说明：

perl -lne  # use -n for suppressing print default at the end of each line

' /(COL1:\s*\S+).+(NAME=\w+)/  # Match pattern and capture them in capture groups first () will be $1 and second () will be in $2
                               # First ()  matches COL1:\s*\S+ => COL1: followed by zero or more spaces using \s* and \S+ for non-space characters
                               # .+ => match all strings between first () and second ()
                               # Seecond ()  matches NAME followed by a word \w+


and                            # bind on the success of previous condition /..../
print "$1,\t$2"                # print the $1 and $2 captured variables 

' input_file

Answer 2

您能否请尝试（用GNU SYS_REFCURSOR测试和编写）。

awk

我在每一行中对字符串awk ' BEGIN{ OFS=" , " } match($0,/COL[0-9]+: [^,]*/){ val=substr($0,RSTART,RLENGTH) match($0,/NAME[^,]*/) print val OFS substr($0,RSTART,RLENGTH) val="" } ' Input_file和COL的匹配进行了汇总，因此，如果任何一行中都没有字符串NAME，则可能不会在其中打印任何内容它。

如果在一行中未找到字符串COL，而您仍要打印COL字符串匹配项，然后尝试执行以下操作。

NAME

说明： 现在添加上述代码的说明。

awk '
BEGIN{
  OFS=" , "
}
match($0,/COL[0-9]+: [^,]*/){
  val=substr($0,RSTART,RLENGTH)
}
match($0,/NAME[^,]*/){
  if(val){
    printf "%s%s",val,OFS
  }
  print substr($0,RSTART,RLENGTH)
}
'    Input_file

从awk ' ##Starting awk program heer. BEGIN{ ##Starting BEGIN section for awk code here. OFS=" , " ##Setting OFS output field separator as space comma space here. } ##Closing BEGIN section here. match($0,/COL[0-9]+: [^,]*/){ ##Using match of awk OOTB function to match a REGEX string COL till comma here. val=substr($0,RSTART,RLENGTH) ##If a match is foundthen creating variable val whose value is sub string of matched regex starting to till end value of it. match($0,/NAME[^,]*/) ##Again using match to match string from NAME to till next comma comes. print val OFS substr($0,RSTART,RLENGTH) ##Printing value of variable val OFS and substring of current line whose sarting point is RSTART and end point is RLENGTH. val="" ##Nullifying variable val here. } ' Input_file ##Mentioning Input_file name here.页添加参考：

man awk

Answer 3

With grep you can maybe try something like that :

while read line; do COL=$(echo $line | grep -o "COL1:.*,"); NAME=$(echo $line | grep -o "NAME=[a-zA-Z]*"); echo $COL $NAME >> new_file.txt; done < your_file.txt

The regexp in this example assume that the value after COL1 are always followed by a "," (then it take every characters between the : and ,) so you might have to adapt it to fit your file (same for the regexp used for NAME).

Answer 4

尝试一下：

$ sed 'H;s/.*NAME=/NAME=/;s/ *,.*//;x;s/^.*COL1/COL1/;s/ *,.*//;G;s/\n/\t, /;' file
COL1: VALUE1    , NAME=AUDIT
COL1: VALUE2    , NAME=generic
COL1: 289       , NAME=Oil

使用了保留空间，并使用\t进行对齐。

Answer 5

由gnu sed

$ sed -E 's/^([^,]+,\s*)?(col1:[^,]+).+(,\s*name=\w+).*/\2\3/i' file.txt

什么grep / awk / sed命令用于我想要的输出

5 个答案: