在Awk中使用多个数组而不重复代码

时间:2017-03-31 11:26:34

标签: arrays linux multidimensional-array awk gawk

我有工作代码

BEGIN { FS=";"; }   # field separator
{ 
    if (match($2, /[0-9]+/)) {           # matching `ID` value
        m=substr($2, RSTART, RLENGTH);
        a[m]++;                          # accumulating number of lines for each `ID`
        print > m"_count.txt";    # writing lines pertaining to certain `ID` into respective file
    } 
}
END {
    for(i in a) { 
        print "mv "i"_count.txt "i"_"a[i]".txt"  # renaming files with actual counts
    }
} 

现在我需要改变它来做这样的事情。 所以我有三个ID数组,每个数组意味着单独的文件夹以保存结果。

BEGIN { FS=";"; }   # field separator
{
    array1=(125 258 698 874)
    array2=(956 887 4455 22)
    array3=(111 444 558 966 332)
    if ($1 == $2) {varR=$3} else {varR=$2}
    if (match(varR, /[0-9]+/)) {           # matching `ID` value
        if ( varR in array1 ) {
            FolderName = "folder1/"
            m1=substr(varR, RSTART, RLENGTH);
            a1[m1]++;                          # accumulating number of lines for each `ID`
            print > (FolderName m1)"_count.txt";    # writing lines pertaining to certain `ID` into respective file
        }
        if ( varR in array2 ) {
            FolderName = "folder2/"
            m2=substr(varR, RSTART, RLENGTH);
            a2[m2]++;                          # accumulating number of lines for each `ID`
            print > (FolderName m2)"_count.txt";    # writing lines pertaining to certain `ID` into respective file
        }
        if ( varR in array3 ) {
            FolderName = "folder3/"
            m3=substr(varR, RSTART, RLENGTH);
            a3[m3]++;                          # accumulating number of lines for each `ID`
            print > (FolderName m3)"_count.txt";    # writing lines pertaining to certain `ID` into respective file
        }
    } 
}
END {
    for(i in a1) { 
        print "mv "i"_count.txt "i"_"a1[i]".txt"  # renaming files with actual counts
    }
    for(i in a2) { 
        print "mv "i"_count.txt "i"_"a2[i]".txt"  # renaming files with actual counts
    }
    for(i in a3) { 
        print "mv "i"_count.txt "i"_"a3[i]".txt"  # renaming files with actual counts
    }
} 

因为我需要将匹配的ID保存到txt文件并放入所需的文件夹 如果我有100个阵列怎么办?我需要为每个代码重复代码吗?

2 个答案:

答案 0 :(得分:1)

使用 GNU Awk的多维阵列支持,这是一个简化的解决方案,演示了您需要的技术:

$ gawk '
  BEGIN { FS=";" }   # field separator
  {
      # Initialize the sub-arrays of the multi-dimensional array.
      array[1][""]; split("125;258;698;874", aux); for (i in aux) array[1][aux[i]]
      array[2][""]; split("956;887;4455;22", aux); for (i in aux) array[2][aux[i]]
      array[3][""]; split("111;444;558;966;332", aux); for (i in aux) array[3][aux[i]]
      n = length(array) # The count of sub-arrays
      if ($1 == $2) {varR=$3} else {varR=$2}
      if (match(varR, /[0-9]+/)) {           # matching `ID` value
        for (i=1;i<=n;++i) {                 # loop over all arrays
          if (varR in array[i]) {            # look for the ID among the array keys
            print "folder" i
            break
          }
        }
      }        
  } 
' <<<'1;1;4455'
folder 2
  • 有关此命令中使用的数组初始化和多维数组技术的解释,请参阅我的this answer

  • 请注意,数组初始化会将数字存储在数组array[<n>]中,因为这需要使用<value> in array[<n>]查找值

您尝试了什么

  • Awk没有数组初始化语法;您的代码中创建的array1=(125 258 698 874)单个字符串"125258698874"

    • 周围的()在这里没有效果(他们只是为了优先权)。
    • 在Awk中放置令牌 - 无论是数字还是字符串 - 彼此相邻执行字符串连接
    • 也许您错误地认为 Bash 的数组初始值设定语法也适用于Awk。
  • ( varR in array1 )varR indices (键)中查找array1,但您的数组初始化与Bash中的方式相同,您必须检查

答案 1 :(得分:0)

您是否需要使用不同的数组,或者您可以这样做:

a[1","1] = "abc";
a[1","2] = "xyz";
a[2","2] = "123";
folders[1] = "folder1";
folders[2] = "folder2";
var = "1";
for (f in folders) {
    if (var","f in a) {
        print a[var","f] " >> " folders[f] "/file_" var;
    }
}