对awk关联数组元素的内容进行排序

时间:2015-03-02 14:16:58

标签: linux bash shell awk associative-array

最初,该文件的内容如下:

1.2.3.4: 1,3,4
1.2.3.5: 9,8,7,6
1.2.3.4: 4,5,6
1.2.3.6: 1,1,1

在我尝试排序错误后,我有这个:

1.2.3.4: 1,3,4,4,5,6,
1.2.3.5: 9,8,7,6,
1.2.3.6: 1,1,1,

我想将其分为以下格式:

1.2.3.4: 1,3,4,5,6
1.2.3.5: 6,7,8,9
1.2.3.6: 1

但是如何访问每个元素中的每个逗号分隔字符并对它们进行排序,以便唯一升序删除重复项?到目前为止,我设法使用的唯一shell脚本只访问整个元素:

awk -F' ' 'NF>1{a[$1] = a[$1]$2","}END{for(i in a){print i" "a[i] | "sort -t: -k1 "}}' c.txt

1 个答案:

答案 0 :(得分:3)

编辑:当原始数据尚未发布时,我第一次将中间数据作为输入,但当然也可以从原始数据中获取。再次使用GNU awk:

gawk -F '[ ,]' 'BEGIN { PROCINFO["sorted_in"] = "@ind_num_asc" } { for(i = 2; i <= NF; ++i) a[$1][$i]; } END { for(ip in a) { line = ip " "; for(n in a[ip]) { line = line n "," } sub(/,$/, "", line); print line } }' filename

代码的工作原理如下:

BEGIN { 
  PROCINFO["sorted_in"] = "@ind_num_asc"  # GNU-specific: sorted array
                                          # traversal
}
{
  for(i = 2; i <= NF; ++i) a[$1][$i]      # remember numbers by ip
}
END {                                     # in the end:
  for(ip in a) {                          # for all ips:
    line = ip " "                         # construct the line: IP
    for(n in a[ip]) {                     # numbers in order
      line = line n ","
    }
    sub(/,$/, "", line)                   # remove trailing comma
    print line                            # print the result.
  }
}

中间数据的旧答案:

使用GNU awk,假设数据的格式与问题中的格式完全相同(尾随,):

gawk -F '[ ,]' 'BEGIN { PROCINFO["sorted_in"] = "@ind_num_asc" } { delete a; for(i = 2; i < NF; ++i) a[$i]; line = $1 " "; for(i in a) line = line i ","; sub(/,$/, "", line); print line; }' filename

文件内容按空格和逗号分隔,然后代码按如下方式工作:

BEGIN { 
  PROCINFO["sorted_in"] = "@ind_num_asc"  # GNU-specific: sorted array
                                          # traversal, numerically ascending
}
{
  delete a
  for(i = 2; i < NF; ++i) { a[$i] }       # remember the fields in a line.
                                          # duplicates are removed here.
                                          # note that it's < NF instead of
                                          # <= NF because the trailing comma
                                          # leaves us with an empty last
                                          # field.

  line = $1 " "                           # start building line: IP field
  for(i in a) {                           # append numbers separated by
    line = line i ","                     # commas
  }
  sub(/,$/, "", line)                     # remove last trailing comma
  print line                              # print result.
}