AWK / SED / getline - 如何简化/改进此示例?

时间:2014-06-22 06:46:28

标签: awk sed getline

我试图获取一个3列输入文件,并根据第3列中的条件将其分开。我认为它更容易向您展示而不是解释:

输入文件:

outputfile1.txt
 26         NCC      1     # First Start
 38         NME      2
 44         NSC      1     # Start2
 56         NME      2
 62         NCC      1     # Start3
...
314         NCC      1     # Start17
326         NME      2
332         NSC      1     # Start18
344         NME      2
349         NME      2     # Final End

(散列的评论不是文件的一部分,我已添加以使事情变得更清晰)。

第3列用于确定新的" START"条目

" START / END"值来自第1列

" TITLE"我想成为连续" STARTS"

之间第2列的所有值

所需输出

outputfile2.txt
START=26 ; END=43 ; TITLE=NCC_NME
START=44 ; END=61 ; TITLE=NSC_NME
START=62 ; END=79 ; TITLE=NCC_...
...
START=314 ; END=331 ; TITLE=NCC_NME
START=332 ; END=349 ; TITLE=NSC_NME

粗略的剧本几乎'这样做但在此过程中制作了5个单列临时文件。

awk '{ print $1 }' outputfile1.txt | sed '$d' > tempfile1.txt
awk '{ print $1-1 }' outputfile1.txt | sed '$d' > tempfile2.txt
sed '$d' outputfile1.txt | awk 'NR{print $3-p}{p=$3}' > tempfile3.txt

awk '  { getline value < "tempfile1.txt" }
       { if (NR==1)
       print value ;
       else if( $1 != 1 )
       print value }' tempfile3.txt > tempfile4.txt

awk '  { getline value < "tempfile2.txt" }
       { if (NR==1)
       print value ;
       else if ( $1 != 1 )
       print value }' tempfile3.txt | sed '1d' > tempfile5.txt
awk 'END{print $1}' outputfile1.txt >> tempfile5.txt

awk '   { getline value < "tempfile5.txt" }
        {print "START="$0 " ; END="value}' tempfile4.txt > outputfile2.txt

临时文件的内容

       |  temp1     temp2     temp3
NR=1   |  26        25        1
NR=2   |  38        37        1
NR=3   |  44        43        -1
NR=4   |  56        55        1
NR=5   |  62        61        -1
...    |  ...       ...       ...
NR=33  |  314       313       -1
NR=34  |  326       325       1
NR=35  |  332       331       -1
NR=36  |  344       343       1
----------------------------------
       | temp4     temp5
NR=1   |  26        43
NR=2   |  44        61
NR=3   |  62        79
...    |  ...       ...
NR=17  |  314       331
NR=18  |  332       359

当前输出

outputfile2.txt
START=26 ; END=43
START=44 ; END=61
START=62 ; END=79
...
START=314 ; END=331
START=332 ; END=349

2 个答案:

答案 0 :(得分:2)

尝试:

awk '
  function print_range() {
    printf "START=%s ; END=%s ; TITLE=%s\n", start, end-1, title
  }

  {
    end=$1
  }

  # if column 3 is equal to 1, then there is a new start
  $3==1 {
    if(title) print_range()
    start=$1
    title=$2
    next
  }

  # if the label in field 2 is not part of the title then add it
  title!~"(^|_)" $2 "(_|$)" {
    title=title"_"$2
  }

  END {
    end++
    print_range()
  }
' file

答案 1 :(得分:1)

你可以一次性完成所有事情:

awk '{ 
  if(NR==1){    

     # if we are the first record we initialize our variables
     PREVIOUS_ONE=$1
     TITLE=$2
     PREVIOUS_THIRD=$3

  } else {

    # as long as the new third column is larger we update our variables
    if(PREVIOUS_THIRD < $3) { 

       TITLE=TITLE"_"$2
       PREVIOUS_THIRD=$3

    } else {  
       # this means the third column was smaller
       # we print out the data and reinitialize our variables
       print "START="PREVIOUS_ONE" ; END="$1-1" ; TITLE= "TITLE;

       PREVIOUS_ONE=$1 
       TITLE=$2
       PREVIOUS_THIRD=$3
    }   
  }
  }' outputfile1.txt