awk生成连续序列 - 续:

时间:2014-08-07 14:08:58

标签: awk

想要读取第一个字段,然后根据“& - ”和“&& - ”分隔符生成序列。   再次读取第一列,然后使用“上一个非空列值”向下填充“空列值”。

但是实际的输入文件没有用逗号FS =“,”分隔,而标签FS =“\ t”

例如:如果Digits字段为210& -3,则只需要填充210和213。       如果Digits字段是210&& -3,则需要填充210,211,212和213.

INPUT.TXT

DIGITS                   AL DEST         CHI CNT NEDEST       CORG  NCHA



  20                        0 ABC          1   N   DEFABC       0     CHARGE      
                            1 ABC          1   N   GHIABC       0     CHARGE      
                            2 ABC          1   N   JKLABC       0     CHARGE      
                            3 ABC          1   N   MNOABC       0     CHARGE      
                            4 ABC          1   N   PQRABC       0     CHARGE    
  2130&&-4&-6&&-8           0 ABC          1   N   DEFABC       0     CHARGE      
                            1 ABC          1   N   GHIABC       0     CHARGE      

因此,遵循以下两个步骤来实现所需的输出。

步骤1:读取第一列,然后向下填充空列值,使用上一个非空列值

awk 'a=/^ /{$0=(x)substr($0,length(x)+1)}!a{x=$1}1' Input.txt > Op_Step1.txt

Op_Step1.txt

  20                        0 ABC          1   N   DEFABC       0     CHARGE
  20                        1 ABC          1   N   GHIABC       0     CHARGE
  20                        2 ABC          1   N   JKLABC       0     CHARGE
  20                        3 ABC          1   N   MNOABC       0     CHARGE
  20                        4 ABC          1   N   PQRABC       0     CHARGE
  2130&&-4&-6&&-8           0 ABC          1   N   DEFABC       0     CHARGE
  2130&&-4&-6&&-8           1 ABC          1   N   GHIABC       0     CHARGE

步骤2:读取第一个字段,然后根据Op_Step1.txt中的“& - ”和“&& - ”分隔符生成序列

感谢EdMorton提供以下脚本:

$ awk -f tst.awk Op_Step1.txt

由于以上输入未用逗号FS =“,”和制表符FS =“\ t”分隔,以下脚本无法正常工作

BEGIN{ FS="\t" }
  {
      for (i=1;i<=NF;i++) {
          if ($i == "") {
              i++
              $i = $1 - $i
              for (j=(prev+1);j<$i;j++) {
                  print j
              }
          }
          else if ($i < 0) {
              $i = $1 - $i
          }

          print $i
          prev = $i
      }
}

期望的输出:

   20                        0 ABC          1   N   DEFABC       0     CHARGE
   20                        1 ABC          1   N   GHIABC       0     CHARGE
   20                        2 ABC          1   N   JKLABC       0     CHARGE
   20                        3 ABC          1   N   MNOABC       0     CHARGE
   20                        4 ABC          1   N   PQRABC       0     CHARGE
   2130        0 ABC          1   N   DEFABC       0     CHARGE
   2131                      0 ABC          1   N   DEFABC       0     CHARGE
   2132                      0 ABC          1   N   DEFABC       0     CHARGE
   2133                      0 ABC          1   N   DEFABC       0     CHARGE
   2134                      0 ABC          1   N   DEFABC       0     CHARGE
   2136                      0 ABC          1   N   DEFABC       0     CHARGE
   2137                      0 ABC          1   N   DEFABC       0     CHARGE
   2138                      0 ABC          1   N   DEFABC       0     CHARGE
   2130        1 ABC          1   N   GHIABC       0     CHARGE
   2131                      1 ABC          1   N   GHIABC       0     CHARGE
   2132                      1 ABC          1   N   GHIABC       0     CHARGE
   2133                      1 ABC          1   N   GHIABC       0     CHARGE
   2134                      1 ABC          1   N   GHIABC       0     CHARGE
   2136                      1 ABC          1   N   GHIABC       0     CHARGE
   2137                      1 ABC          1   N   GHIABC       0     CHARGE
   2138                      1 ABC          1   N   GHIABC       0     CHARGE

任何建议,对不起,长篇大论!!!“

更新评论

 1  NR==1 || !NF { next }                # AVN: To skip header OR Blank Lines
 2
 3  /^[[:digit:]]/ {                     # AVN: To find field starts with [0-9]
 4      blanks = range = $1              # AVN: Assign if the line begins with [0-9] and doesnt start with blank 
                                         # EM: saves the value of $1 in variable "ranges" and also saves it in variable "blanks"
 5      gsub(/./," ",blanks)             # AVN: To fill the empty field with previous assigned value
                                         # EM: replaces every character in the variable "blanks" with a blank character.
 6      $0 = blanks substr($0,length(blanks)+1) # AVN: Not able to understand
                                         # EM: Replaces $1 with a string of the same length but all-blanks so that when we
                                         # later need to change "2130&&-4&-6&&-8" to "2130", "2131", etc. we wont have
                                         # to deal with the original string "2130&&-30&&-4&-6&&-8" still being present in $0.
                                         # Remember we saved the original $1 value in the variable "range" so
                                         # its OK to overwrite the characters in $0 now. We dont simply re-assign
                                         # $1 as that would cause $0 to be recompiled using the current OFS value and
                                         # so destroy all of your original spacing.
 7  }
 8
 9  {
10      split(range,arr,/&/)             # AVN: split & and store the values into arr variable 
11      for (i=1;i in arr;i++) {         # AVN: Looping elements based on arr count
12          if (arr[i] == "") {          # AVN: Not able to catch the below Array Logics
                                         # EM: split("2130&&-4&-6&&-8",arr,/&/) populates arr as
                                         # arr[1]=2130, arr[2]="", arr[3]=-4, arr[4]=-6, arr[5]=""; arr[6]="-8"
                                         # That should help you understand the loop logic - if in doubt add prints
                                         # to dump array and other variable values then update your comments.
13              i++
14              for (j=(prev+1);j<(arr[1]-arr[i]);j++) {
15                  print j substr($0,length(j)+1)
16              }
17          }
18
19          if (arr[i] < 0) {
20              arr[i] = arr[1] - arr[i]
21          }
22
23          print arr[i] substr($0,length(arr[i])+1)
24          prev = arr[i]
25      }
26  }

1 个答案:

答案 0 :(得分:4)

在您从我这里获得的脚本中,不是将FS设置为&并在字段上循环,而是split($1,arr,/&/)并循环arr的元素。

既然您已经付出了努力并且亲自完成并且其余的细节并不完全明显,那么这里就是完整的脚本:

$ cat tst.awk
NR==1 || !NF { next }

/^[[:digit:]]/ {
    blanks = range = $1
    gsub(/./," ",blanks)
    $0 = blanks substr($0,length(blanks)+1)

}

{
    split(range,arr,/&/)
    for (i=1;i in arr;i++) {
        if (arr[i] == "") {
            i++
            for (j=(prev+1);j<(arr[1]-arr[i]);j++) {
                print j substr($0,length(j)+1)
            }
        }

        if (arr[i] < 0) {
            arr[i] = arr[1] - arr[i]
        }

        print arr[i] substr($0,length(arr[i])+1)
        prev = arr[i]
    }
}

$ cat file
DIGITS                   AL DEST         CHI CNT NEDEST       CORG  NCHA



20                        0 ABC          1   N   DEFABC       0     CHARGE
                          1 ABC          1   N   GHIABC       0     CHARGE
                          2 ABC          1   N   JKLABC       0     CHARGE
                          3 ABC          1   N   MNOABC       0     CHARGE
                          4 ABC          1   N   PQRABC       0     CHARGE
2130&&-4&-6&&-8           0 ABC          1   N   DEFABC       0     CHARGE
                          1 ABC          1   N   GHIABC       0     CHARGE

$ awk -f tst.awk file
20                        0 ABC          1   N   DEFABC       0     CHARGE
20                        1 ABC          1   N   GHIABC       0     CHARGE
20                        2 ABC          1   N   JKLABC       0     CHARGE
20                        3 ABC          1   N   MNOABC       0     CHARGE
20                        4 ABC          1   N   PQRABC       0     CHARGE
2130                      0 ABC          1   N   DEFABC       0     CHARGE
2131                      0 ABC          1   N   DEFABC       0     CHARGE
2132                      0 ABC          1   N   DEFABC       0     CHARGE
2133                      0 ABC          1   N   DEFABC       0     CHARGE
2134                      0 ABC          1   N   DEFABC       0     CHARGE
2136                      0 ABC          1   N   DEFABC       0     CHARGE
2137                      0 ABC          1   N   DEFABC       0     CHARGE
2138                      0 ABC          1   N   DEFABC       0     CHARGE
2130                      1 ABC          1   N   GHIABC       0     CHARGE
2131                      1 ABC          1   N   GHIABC       0     CHARGE
2132                      1 ABC          1   N   GHIABC       0     CHARGE
2133                      1 ABC          1   N   GHIABC       0     CHARGE
2134                      1 ABC          1   N   GHIABC       0     CHARGE
2136                      1 ABC          1   N   GHIABC       0     CHARGE
2137                      1 ABC          1   N   GHIABC       0     CHARGE
2138                      1 ABC          1   N   GHIABC       0     CHARGE