awk选择数据行

时间:2011-07-25 09:04:12

标签: awk selection rows

我必须使用awk处理以下数据文件:

YEARS:1995:1996:1997:1998:1999:2000
VISITS
Domain1:259:2549:23695:24889:1240:21202
Domain2:32632:87521:147122:22952:2365:121230
Domain3:5985:92104:921744:43124:74234:68350
Domain4:8321:36520:68712:32102:22003:82100
SIGNUPS
Domain1:212:202:992:1202:986:3253
Domain2:10401:44522:20103:3595:11410:353
Domain3:3695:23230:452030:25052:9858:3020
Domain4:969:24247:9863:24101:5541:3663

我需要知道每年和域名的总访问量和注册量。我的问题是我找不到只选择前四行和后四行的方法,有人能给我一些如何实现这一点的提示吗?

示例输出(仅限访问次数):

VISITS
Domain1     73834
Domain2     413822
Domain3     1205541
Domain4     309758

        1995    1996    1997    1998    1999    2000
All     47197   218694  1161273 123067  99842   292882

3 个答案:

答案 0 :(得分:1)

您可以匹配“VISITS”和“SIGNUPS”行,并设置一个变量,指示您正在处理的记录类型。

一个例子:

BEGIN {
    FS = ":";
}
/^YEARS/ {
    for (i = 2 ; i <= NF; i++) {
        year[i] = $i;
    }
    next;
}
/^VISITS/ {
    mode = "VISITS";
    next;
}
/^SIGNUPS/ {
    mode = "SIGNUPS";
    next;
}
{
    for (i = 2; i <= NF; i++) {
        # output "VISITS"/"SIGNUPS", domain, year, value
        print mode, $1, year[i], $i;
    }
}

答案 1 :(得分:1)

awk -F: 'END { out( ) }
/^YEARS/ {
  for ( i = 1; ++i <= NF; ) {
    y[i] = $i
    yh = yh ? yh OFS $i : $i
    }
    ny = NF; next   
  }
NF == 1 { 
  m && out( ); m = $1
  }
{
  ym[y[1]] = "ALL:"
  for ( i = 1; ++i <= NF; ) {
    d[$1] += $i; ym[y[i]] += $i
    }   
  } 
func out( ) {
  print m
  for ( D in d ) print D, d[D]
  printf "\n%s\n", OFS yh
  for ( i = 0; ++i <= ny; )
    printf "%s", ( ym[y[i]] ( i < ny ? OFS : RS ) )
  print x; split( x, d ); split( x, ym )  
  }' OFS='\t' infile

使用GNU awk,您可以使用:

delete d; delete ym

而不是:

split( x, d ); split( x, ym )

答案 2 :(得分:1)

当您说“仅选择前四行和后四行”时,我认为您的意思是分别处理访问和注册:

awk -F: '
$1 == "YEARS"   {for (i=2; i<=NF; i++) {yr[i] = $i}; next}
$1 == "VISITS"  {visits = 1; signups = 0; next}
$1 == "SIGNUPS" {visits = 0; signups = 1; next}
visits { 
  for (i=2; i<=NF; i++) {
    v_d[$1] += $i     # visits by domain
    v_y[yr[i]] += $i  # visits by year
  }
}
signups {
  for (i=2; i<=NF; i++) {
    s_d[$1] += $i     # signups by domain
    s_y[yr[i]] += $i  # signups by year
  }
}
END {
  OFS=FS
  print "VISITS"
  for (d in v_d) print d, v_d[d]
  for (y in v_y) print y, v_y[y]
  print "SIGNUPS"
  for (d in s_d) print d, s_d[d]
  for (y in s_y) print y, s_y[y]
}'

根据您的输入,此输出

VISITS
Domain1:73834
Domain2:413822
Domain3:1205541
Domain4:249758
1999:99842
2000:292882
1995:47197
1996:218694
1997:1161273
1998:123067
SIGNUPS
Domain1:6847
Domain2:90384
Domain3:516885
Domain4:68384
1999:27795
2000:10289
1995:15277
1996:92201
1997:482988
1998:53950