我正在尝试执行以下操作,我的数据集看起来像这样,它包含POSIXct格式的日期,每小时风速和每小时风向(df称为wind_DNSeason)。我的目标是根据季节和日光的大小来获得风速的频率计数。
date wspd_havg10m_kn avg_wdir
1 2013-12-06 00:25:00 9.835853 50
2 2013-12-06 01:25:00 10.506479 56
3 2013-12-06 02:25:00 11.847732 55
4 2013-12-06 03:25:00 8.494600 53
5 2013-12-06 04:25:00 13.188985 47
6 2013-12-06 05:25:00 13.188985 60
根据日期添加季节:
wind_DNSeason$season<-time2season(wind_DNSeason$date, out.fmt="seasons", type="default")
然后我使用openair包将数据切割成白天和夜晚:
wind_DNSeason$daylight <- cutData(wind, type = "daylight", local.hour.offset = -8, latitude = 54.312519, longitude = -130.305405, local.tz= "Canada/Pacific")
我知道函数聚合但我怀疑我是否正确使用它:
aggregate(wspd_havg10m_kn ~ season + daylight, wind_DNSeason, length)
这给了我发生的次数,但这不是我想要的。 我试图一步到位吗?
我需要知道每个季节分组的白天和晚上发生的风速分组(见下文)。因为我想创建具有不同频率的条形图。 break = c(0,1,3,6,10,16,21,27,33,40,47)
我可以得到一些看起来像这样的东西,然后我可以轻松地计算出在条形图中绘制它的百分比:
season daylight total_count wspd<=1 wspd>1,<=3 wspd>3,<=6 etc
1 autumm daylight 854 151 34 56
2 spring daylight 2580 456 56 98
3 summer daylight 1722 34 344 09
4 winter daylight 852 545 55 55
5 autumm nighttime 1030 55 6 777
6 spring nighttime 1825 89 89 344
7 summer nighttime 827 344 55 66
8 winter nighttime 1533 34 66 777
任何想法?感谢任何帮助!
我尝试使用dplyr并且我认为我非常接近,但不知何故它似乎没有正确地加上频率。这就是我应用建议代码的方式:
a<-wind_DNSeason %>% group_by(season,daylight) %>%
mutate(count=n(),"wspd<=1" = sum(wspd_havg10m_kn<=1),
"wspd>1,<=3" = sum(wspd_havg10m_kn > 1 & wspd_havg10m_kn <= 3, na.rm=TRUE),
"wspd>3,<=6" = sum(wspd_havg10m_kn > 3 & wspd_havg10m_kn <= 6,na.rm=TRUE),
"wspd>6,<=10" = sum(wspd_havg10m_kn > 6 & wspd_havg10m_kn <= 10,na.rm=TRUE),
"wspd>10,<=16" = sum(wspd_havg10m_kn > 10 & wspd_havg10m_kn <= 16,na.rm=TRUE),
"wspd>16,<=21" = sum(wspd_havg10m_kn > 16 & wspd_havg10m_kn <= 21,na.rm=TRUE),
"wspd>21,<=27" = sum(wspd_havg10m_kn > 21 & wspd_havg10m_kn <= 27,na.rm=TRUE),
"wspd>27,<=33" = sum(wspd_havg10m_kn > 27 & wspd_havg10m_kn <= 33,na.rm=TRUE),
"wspd>33,<=40" = sum(wspd_havg10m_kn > 33 & wspd_havg10m_kn <= 40,na.rm=TRUE),
"wspd>40,<=47" = sum(wspd_havg10m_kn > 33 & wspd_havg10m_kn <= 47,na.rm=TRUE))
输出看起来像这样,我选择了一些独特的行,因为它在整个df中复制它(例如冬天和夜晚):
date wspd_havg10m_kn avg_wdir daylight season count wspd<=1 wspd>1,<=3 wspd>3,<=6 wspd>6,<=10 wspd>10,<=16 wspd>16,<=21 wspd>21,<=27 wspd>27,<=33 wspd>33,<=40 wspd>40,<=47
1 2013-12-06 00:25:00 9.8358531 50 nighttime winter 2751 NA 59 185 315 551 260 106 47 6 6
2 2013-12-06 12:25:00 7.3768898 57 daylight winter 1449 NA 13 73 251 322 133 46 13 0 0
不同组的频率是否应该与总计数相加?总df包含13368个时间步长,如果我将每个组的频率相加,我只得到11165.没有比最大组更大的风速。我错过了什么?
答案 0 :(得分:1)
这是一个dplyr
解决方案:
library(dplyr)
wind_DNSeason %>% group_by(season,daylight) %>%
summarise(count=n(),"wspd<=1" = sum(wspd_havg10m_kn<=1),
"wspd>1,<=3" = sum(wspd_havg10m_kn > 1 & wspd_havg10m_kn <= 3),
"wspd>3,<=6" = sum(wspd_havg10m_kn > 3 & wspd_havg10m_kn <= 6)
)
您可以根据需要添加任意数量的风力强度列,并填写名称和要求。
答案 1 :(得分:0)
您在评论中提到plyr
,因此您可以执行以下操作:
library("plyr")
ddply(wind_DNSeason, .(season, daylight), summarize, n = length(wspd_havg10m_kn),
"wspd<=1" = sum(wspd_havg10m_kn <= 1))
此外,如果要自动创建这些计算值,您可以执行以下操作:
calc = function(x) {
cuts = c(1, 3, 6, 10)
res = data.frame(n = nrow(x))
for(i in 1:(length(cuts) - 1)) {
nm = sprintf("wspd>%d, <=%d", cuts[i], cuts[i + 1])
val = sum(x$wspd_havg10m_kn > cuts[i] & x$wspd_havg10m_kn < cuts[i + 1], na.rm = T)
res[, nm] = val
}
return(res)
}
ddply(wind_DNSeason, .(season, daylight), "calc")