在Unix中转置

时间:2015-09-07 09:17:00

标签: awk sed transpose

我以这种方式存档每小时数据

2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABC|673
2015-09-03 02:00:00 to 2015-09-03 02:59:59|AABC|52
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABCD|787
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ADFGE|35
2015-09-03 02:00:00 to 2015-09-03 02:59:59|AGER|41
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ETECFF|1384
2015-09-03 02:00:00 to 2015-09-03 02:59:59|TRIFD|38
2015-09-03 02:00:00 to 2015-09-03 02:59:59|CVGFFHG|166
2015-09-03 03:00:00 to 2015-09-03 03:59:59|FJREER|36
2015-09-03 03:00:00 to 2015-09-03 03:59:59|DFSD|31
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ASBF|38
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABC|36
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AABC|35
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABCD|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ADFGE|39
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AGER|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ETECFF|537
2015-09-03 03:00:00 to 2015-09-03 03:59:59|TRIFD|620635
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABC|37
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AABC|702
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABCD|319
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ADFGE|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AGER|306
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ETECFF|34
2015-09-03 03:00:00 to 2015-09-03 03:59:59|TRIFD|44
2015-09-03 03:00:00 to 2015-09-03 03:59:59|CVGFFHG|599
2015-09-03 03:00:00 to 2015-09-03 03:59:59|FJREER|30
2015-09-03 03:00:00 to 2015-09-03 03:59:59|DFSD|82

我想转置数据,

1. Column 1 should go in as column header 
2. Column 2 should go in row header
3. Column 3 is data
4. Any absence of data should be represented as 0 (Zero)

以下是转置数据应如何显示

|2015-09-03 02:00:00 to 2015-09-03 02:59:59|2015-09-03 03:00:00 to 2015-09-03 03:59:59
AABC|52|737
ABC|0|73
ABCD|787|352
ADFGE|35|72
AGER|41|339
ASBF|0|38
CVGFFHG|166|599
DFSD|0|113
ETECFF|1384|571
FJREER|0|66
TRIFD|38|620679

我尝试过使用sed,但这不起作用。我还不是很好,还没达到高级水平,所以需要帮助

3 个答案:

答案 0 :(得分:1)

这是awk的解决方案。它保存在2D数组values中 具有相同关键字key和相同标题列索引i的所有行。 在END,所有这些都打印在每个键和列上。 数组cols用于检测标题列的更改。 hdrs用于保持标题按正确的顺序输出。 keys仅用于保留所有关键字的列表。

awk -F'|' '
{ hdr = $1; key = $2; val = $3;
  if(cols[hdr]==0){
    cols[hdr] = ++column;
    hdrs[column] = hdr;
  }
  i = cols[hdr]
  keys[key] = 1
  values[i, key] += val
}
END{
  for(i = 1;i<=column;i++)
   printf  "|%s", hdrs[i]
  printf "\n"
  n = asorti(keys,sort)
  for(j = 1;j<=n;j++){
     key = sort[j]
     printf "%s",key
     for(i = 1;i<=column;i++)
      printf "|%s", values[i, key]+0
     printf "\n"
  }
}'

答案 1 :(得分:0)

我认为在awk中你可以创建一个索引类型为string的数组,也就是以列1为键的字典。

该数组的每个元素都应该填充另一个带索引字符串的数组:第2列作为键。

然后处理每一行,在必要时创建新的数组元素,并将第3列添加到值中。

有关awk中语法的帮助:

http://www.thegeekstuff.com/2010/03/awk-arrays-explained-with-5-practical-examples/

请看第5节中的示例1最终解决方案的简单程度。

答案 2 :(得分:0)

另一个awk

awk -F '|' '
  {
  Data[ $1, $2] = $3 + 1
  if( match( Headers, "(^\||)" $1 "(|\|$)" ) == 0 ) Headers = Headers $1 "|"
  if( match( Records, "(^\||)" $2 "(|\|$)" ) == 0 ) Records = Records $2 "|"
  }
END {
  cHeader = split( Headers, aHeader, "|" )
  cRecord = split( Records, aRecord, "|" )

  sub( /\|$/, "", Headers
  print "|" Headers

  for( iRecord = 1; iRecord <= cRecord; iRecord++) {
     printf "%s", aRecord[ 1]
     for( iHeader = 2; iHeader <= cHeader; iHeader++ ) {
        ThisData = Data[ aHeader[ iHeader], aRecord[ iRecord] ]
        printf "|%s", --ThisData
        }
     print
     }
  }
' YourFile
  • 使用字符串作为行名和列名提醒符,使用 muti dimensionnal 数组作为数据。
  • 使用$3 + 1及更高版本--ThisData强制执行0