我有一个非常大的数据集,我需要在几个月内获取Station_ID的方法。
以下是数据样本:
DF <- read.table(text="Station_ID January February March April May June July August September October November December Year
1 17578 30.04 12.95 33.29 134.38 167.40 89.48 49.75 65.78 50.15 30.35 70.72 20.68 1896
2 18982 29.66 13.03 33.31 134.20 167.40 89.48 47.64 65.57 49.87 29.98 70.57 20.55 1896"
, header = TRUE)
产生这个:
Station_ID January February March April May June July August September October November December Year
1 17578 30.04 12.95 33.29 134.38 167.4 89.48 49.75 65.78 50.15 30.35 70.72 20.68 1896
2 18982 29.66 13.03 33.31 134.20 167.4 89.48 47.64 65.57 49.87 29.98 70.57 20.55 1896
这是我想要的输出:
Station_ID AVGPPT_1896
1 17587 62.91
2 18982 60.89
任何帮助将不胜感激。感谢。
答案 0 :(得分:2)
这是一个选项,使用dplyr和tidyr。首先将数据从宽格式转换为长格式(使用tidyr的收集功能),然后按Station ID分组并生成每月的平均值。
library(tidyr)
library(dplyr)
gather(DF, Month, Value, -c(Station_ID, Year)) %>%
group_by(Station_ID) %>%
summarise(AVGPPT_1896 = mean(Value))
#Source: local data frame [2 x 2]
#
# Station_ID AVGPPT_1896
#1 17578 62.91417
#2 18982 62.60500
答案 1 :(得分:2)
你可以试试这个:
DF$AVGPPT_1896<-rowMeans(DF[,-c(1,ncol(DF))])
或
DF$AVGPPT_1896<-rowMeans(DF[,month.name])
两者都给:
> DF[,c("Station_ID","AVGPPT_1896")]
Station_ID AVGPPT_1896
1 17578 62.91417
2 18982 62.60500