AWK计划找到三个州的平均降雨量

时间:2010-10-16 21:00:51

标签: awk gawk

我想找到1月到12月特定月份CA,TX和AX三个州的平均降雨量。给定由TAB SPACES分隔的输入文件并具有格式 city name, the state , and then average rainfall amounts from January through December, and then an annual average for all months。 EG可能看起来像

AVOCA   PA  30  2.10    2.15    2.55    2.97    3.65    3.98    3.79    3.32     3.31   2.79    3.06    2.51    36.18
BAKERSFIELD CA  30  0.86    1.06    1.04    0.57    0.20    0.10    0.01    0.09    0.17    0.29    0.70    0.63    5.72

我想要做的是“获得特定月份的平均降雨量之和,超过n年,然后找出CA,TX和AX州的平均值。

我在awk中编写了下面的脚本来做同样的事情,但它没有给我预期的输出

/^CA$/ {CA++; CA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/^TX$/ {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/^AX$/ {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only 
END {
     CA_avg = CA_SUM/CA;
     TX_avg = TX_SUM/TX;
     AX_avg = AX_SUM/AX; 
     printf("CA Rainfall: %5.2f",CA_avg);
     printf("CA Rainfall: %5.2f",TX_avg);
     printf("CA Rainfall: %5.2f",AX_avg);
    }

我用命令调用程序  awk 'FS="\t"'-f awk1.awk rainfall.txt并且看不到输出。

问题:我在哪里滑倒?任何建议和更改的代码将不胜感激

2 个答案:

答案 0 :(得分:3)

模式/^CA$/表示字符“C”和“A”是该行中唯一的字符。你想要:

$2 == "CA" {CA++; CA_SUM+= $5}
# etc.

然而,这是DRYer:

{ count[$2]++; sum[$2] += $5 }
END {
    for (state in count) {
        printf("%s Rainfall: %5.2f\n", state, sum[state]/count[state])
    }
}

此外,这看起来不对:awk 'FS="\t"'-f awk1.awk rainfall.txt
尝试:awk -F '\t' -f awk1.awk rainfall.txt


对评论的回应:

awk -F '\t' -v month=2 -v states="CA,AZ,TX" '
    BEGIN {
        month_col = month + 3  # assume January is month 1
        split(states, wanted_states, /,/)
    }
    { count[$2]++; sum[$2] += $month_col }
    END {
        for (state in wanted_states) {
            if (state in count) {
                printf("%s Rainfall: %5.2f\n", state, sum[state]/count[state])
            else
                print state " Rainfall: no data"
        }
    }
' rainfall.txt

答案 1 :(得分:2)

你的正则表达式应该是

/ CA / {CA++; cA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/ TX / {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/ AX / {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only 

/ ^ AX $ /仅当它是行

中的唯一单词时才匹配

HTH!

修改

/ CA / {CA++; CA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/ TX / {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/ AX / {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only 
END {

 if(CA!=0){CA_avg = CA_SUM/CA;     printf("CA Rainfall: %5.2f",CA_avg);}
 if(TX!=0){TX_avg = TX_SUM/TX;     printf("TX Rainfall: %5.2f",TX_avg);}
 if(AX!=0){TX_avg = AX_SUM/CA;     printf("AX Rainfall: %5.2f",AX_avg);}
}