我在hkdata.2中有成千上万的条目,我想创建一个循环,可以帮助我总结来自另一个数据帧data.1的公开mxtemp总数。对于data.2中每个houseID中的每个成员。
有专家可以帮我解决这个问题吗?
weather.data
date mpressure mxtemp
1 2008-01-01 1025.3 15.7
2 2008-01-02 1025.6 16.0
3 2008-01-03 1023.6 18.1
4 2008-01-04 1021.8 18.4
5 2008-01-05 1020.1 20.9
6 2008-01-06 1019.7 20.7
7 2008-01-07 1018.4 24.0
8 2008-01-08 1016.7 23.7
hkdata.2
row.names houseID member male date.end date.begin
1 1 1 1 2008-01-07 2008-01-02
2 1 2 0 2008-01-06 2008-01-04
我想从同一成员的date.begin和date.end间隔中获取mxtemp的总和,并将其显示为这样。
hkdata.2
row.names houseID member date.end date.begin Total.exposed.mxtemp
1 1 1 2008-01-07 2008-01-02 118.1
2 1 2 2008-01-06 2008-01-04 60
total.exposed.mxtemp是相应时间间隔内的mxtemp之和(从date.begin到date.end) 即。在row.names 1,118.1 = 16 + 18.1 + 18.4 + 20.9 + 20.7 + 24
我的代码是这样的..
> cbind(hkdata.2, t(sapply(apply(hkdata.2, 1, function(x)
+ weather.data[weather.data$date >= x[6] &
+ weather.data$date <= x[5], c("mxtemp")]), colSums)))
然后我收到了这个错误.....:
Error in FUN(X[[1L]], ...) :
'x' must be an array of at least two dimensions
任何专家都可以帮忙!!
答案 0 :(得分:0)
这里有一种可能性导致您所描述的结果。我不确定这是否是100%dplyr惯用,因为我正在处理两个不同的数据框架,但无论如何,它似乎都有效。
library(dplyr)
hkdata.2 <- hkdata.2 %>%
group_by(houseID, member) %>%
mutate(Totalmxtemp = sum(weather.data$mxtemp[weather.data$date >= date.begin &
weather.data$date <= date.end]))
hkdata.2
#Source: local data frame [2 x 7]
#Groups: houseID, member
#
# row.names houseID member male date.end date.begin Totalmxtemp
#1 1 1 1 1 2008-01-07 2008-01-02 118.1
#2 2 1 2 0 2008-01-06 2008-01-04 60.0