我必须使用awk处理以下数据文件:
YEARS:1995:1996:1997:1998:1999:2000
VISITS
Domain1:259:2549:23695:24889:1240:21202
Domain2:32632:87521:147122:22952:2365:121230
Domain3:5985:92104:921744:43124:74234:68350
Domain4:8321:36520:68712:32102:22003:82100
SIGNUPS
Domain1:212:202:992:1202:986:3253
Domain2:10401:44522:20103:3595:11410:353
Domain3:3695:23230:452030:25052:9858:3020
Domain4:969:24247:9863:24101:5541:3663
我需要知道每年和域名的总访问量和注册量。我的问题是我找不到只选择前四行和后四行的方法,有人能给我一些如何实现这一点的提示吗?
示例输出(仅限访问次数):
VISITS
Domain1 73834
Domain2 413822
Domain3 1205541
Domain4 309758
1995 1996 1997 1998 1999 2000
All 47197 218694 1161273 123067 99842 292882
答案 0 :(得分:1)
您可以匹配“VISITS”和“SIGNUPS”行,并设置一个变量,指示您正在处理的记录类型。
一个例子:
BEGIN {
FS = ":";
}
/^YEARS/ {
for (i = 2 ; i <= NF; i++) {
year[i] = $i;
}
next;
}
/^VISITS/ {
mode = "VISITS";
next;
}
/^SIGNUPS/ {
mode = "SIGNUPS";
next;
}
{
for (i = 2; i <= NF; i++) {
# output "VISITS"/"SIGNUPS", domain, year, value
print mode, $1, year[i], $i;
}
}
答案 1 :(得分:1)
awk -F: 'END { out( ) }
/^YEARS/ {
for ( i = 1; ++i <= NF; ) {
y[i] = $i
yh = yh ? yh OFS $i : $i
}
ny = NF; next
}
NF == 1 {
m && out( ); m = $1
}
{
ym[y[1]] = "ALL:"
for ( i = 1; ++i <= NF; ) {
d[$1] += $i; ym[y[i]] += $i
}
}
func out( ) {
print m
for ( D in d ) print D, d[D]
printf "\n%s\n", OFS yh
for ( i = 0; ++i <= ny; )
printf "%s", ( ym[y[i]] ( i < ny ? OFS : RS ) )
print x; split( x, d ); split( x, ym )
}' OFS='\t' infile
使用GNU awk,您可以使用:
delete d; delete ym
而不是:
split( x, d ); split( x, ym )
答案 2 :(得分:1)
当您说“仅选择前四行和后四行”时,我认为您的意思是分别处理访问和注册:
awk -F: '
$1 == "YEARS" {for (i=2; i<=NF; i++) {yr[i] = $i}; next}
$1 == "VISITS" {visits = 1; signups = 0; next}
$1 == "SIGNUPS" {visits = 0; signups = 1; next}
visits {
for (i=2; i<=NF; i++) {
v_d[$1] += $i # visits by domain
v_y[yr[i]] += $i # visits by year
}
}
signups {
for (i=2; i<=NF; i++) {
s_d[$1] += $i # signups by domain
s_y[yr[i]] += $i # signups by year
}
}
END {
OFS=FS
print "VISITS"
for (d in v_d) print d, v_d[d]
for (y in v_y) print y, v_y[y]
print "SIGNUPS"
for (d in s_d) print d, s_d[d]
for (y in s_y) print y, s_y[y]
}'
根据您的输入,此输出
VISITS
Domain1:73834
Domain2:413822
Domain3:1205541
Domain4:249758
1999:99842
2000:292882
1995:47197
1996:218694
1997:1161273
1998:123067
SIGNUPS
Domain1:6847
Domain2:90384
Domain3:516885
Domain4:68384
1999:27795
2000:10289
1995:15277
1996:92201
1997:482988
1998:53950