在R的另一个data.frame的区间内汇总值

时间:2014-07-17 15:30:50

标签: r date loops intervals

我在hkdata.2中有成千上万的条目,我想创建一个循环,可以帮助我总结来自另一个数据帧data.1的公开mxtemp总数。对于data.2中每个houseID中的每个成员。

有专家可以帮我解决这个问题吗?

weather.data
date   mpressure mxtemp     
1   2008-01-01  1025.3  15.7        
2   2008-01-02  1025.6  16.0        
3   2008-01-03  1023.6  18.1        
4   2008-01-04  1021.8  18.4        
5   2008-01-05  1020.1  20.9        
6   2008-01-06  1019.7  20.7        
7   2008-01-07  1018.4  24.0        
8   2008-01-08  1016.7  23.7

hkdata.2
row.names   houseID member  male       date.end date.begin 
1             1       1      1      2008-01-07  2008-01-02      
2             1       2      0      2008-01-06  2008-01-04

我想从同一成员的date.begin和date.end间隔中获取mxtemp的总和,并将其显示为这样。

hkdata.2
row.names   houseID member          date.end    date.begin  Total.exposed.mxtemp
1             1       1           2008-01-07    2008-01-02     118.1
2             1       2           2008-01-06    2008-01-04     60

total.exposed.mxtemp是相应时间间隔内的mxtemp之和(从date.begin到date.end) 即。在row.names 1,118.1 = 16 + 18.1 + 18.4 + 20.9 + 20.7 + 24

我的代码是这样的..

> cbind(hkdata.2, t(sapply(apply(hkdata.2, 1, function(x)
+   weather.data[weather.data$date >= x[6] &
+                  weather.data$date <= x[5], c("mxtemp")]), colSums)))

然后我收到了这个错误.....:

Error in FUN(X[[1L]], ...) : 
  'x' must be an array of at least two dimensions

任何专家都可以帮忙!!

1 个答案:

答案 0 :(得分:0)

这里有一种可能性导致您所描述的结果。我不确定这是否是100%dplyr惯用,因为我正在处理两个不同的数据框架,但无论如何,它似乎都有效。

library(dplyr)

hkdata.2 <- hkdata.2 %>%
  group_by(houseID, member) %>%
  mutate(Totalmxtemp = sum(weather.data$mxtemp[weather.data$date >= date.begin &
                                             weather.data$date <= date.end]))

hkdata.2
#Source: local data frame [2 x 7]
#Groups: houseID, member
#
#  row.names houseID member male   date.end date.begin Totalmxtemp
#1         1       1      1    1 2008-01-07 2008-01-02       118.1
#2         2       1      2    0 2008-01-06 2008-01-04        60.0