在R中使用ave函数 - 计算每人独特月份的数量

时间:2014-01-29 12:35:47

标签: r

我想在下面的数据框中计算每个人(hai_dispense_number)每月(hai_dispense_number)的行数。我的总体目标是看看4月到9月的平均行数是否增加。我很确定我应该使用ave函数来创建一个count变量。但我所有的尝试都不适合我。见下面的尝试。一旦我完成了计数,我想我将能够使用ddply每月进行一次平均摘要。下面是一个玩具df,列'obs'是我想要的输出。

df
         hai_dispense_number date_of_claim hai_atc     month obs
9972511   Patient HAI0002664    2010-04-07 A10BA02     april   1
11376245  Patient HAI0002664    2010-05-04 A10BA02       may   1
12508505  Patient HAI0002664    2010-05-31 A10BA02       may   2
13480611  Patient HAI0002664    2010-06-30 A10BA02      june   1
13486327  Patient HAI0002664    2010-06-30 A10BH03      june   2
13567944  Patient HAI0002664    2010-06-08 A10BA02      june   3
15003657  Patient HAI0002664    2010-07-27 A10BA02      july   1
15003658  Patient HAI0002664    2010-07-27 A10BH03      july   2
16600413  Patient HAI0002664    2010-08-31 A10BB09    august   1
16600866  Patient HAI0002664    2010-08-23 A10BA02    august   2
16600867  Patient HAI0002664    2010-08-23 A10BH03    august   3
17537505  Patient HAI0002664    2010-08-27 A10BB09    august   4
19176349  Patient HAI0002664    2010-09-17 A10BB09 september   1
19176350  Patient HAI0002664    2010-09-17 A10BH03 september   2
19176358  Patient HAI0002664    2010-09-17 A10BA02 september   3
17765433  Patient HAI0006637    2010-09-17 A10BA02 september   4
12953451  Patient HAI0007418    2010-06-04 A10BA02      june   1
15786889  Patient HAI0007418    2010-07-28 A10BB09      july   1
15787103  Patient HAI0007418    2010-07-12 A10BB09      july   2
15787233  Patient HAI0007418    2010-07-05 A10BA02      july   3
15878776  Patient HAI0007418    2010-07-08 A10BB09      july   4
15908690  Patient HAI0007418    2010-07-23 A10BB09      july   5
17363576  Patient HAI0007418    2010-08-20 A10BB09    august   1
17554737  Patient HAI0007418    2010-08-13 A10BB09    august   2

之前的尝试

df$obs<-with(df, ave(month, hai_dispense_number, FUN=seq_along))  ##doesn't split by month

df$obs<-with(df, ave(month, hai_dispense_number, FUN=cumsum))  ##gives all NA values, think seq_along is actually what I want

df$obs <- ave(df$month, df$month, FUN=seq_along)  ##this is better than the previous two, but doesn't seem to split by person

ddply(df,~month,summarise,mean=mean(obs)) ##this works absolutely fine, just need to counts right first!

会重视任何人可以提供的任何输入。好像我在这里遇到了一些根本性的错误。

1 个答案:

答案 0 :(得分:2)

好的,我已将您的数据减少到:

> head(df)
            patient month
9972511  HAI0002664 april
11376245 HAI0002664   may
12508505 HAI0002664   may
13480611 HAI0002664  june
13486327 HAI0002664  june
13567944 HAI0002664  june

这就是我们所需要的,因为我们只使用患者标识符和数月。要获取所需的新列,请尝试以下操作:

library(plyr)

> ddply(df, .(patient, month), mutate, obs = 1:length(month))
      patient     month obs
1  HAI0002664     april   1
2  HAI0002664    august   1
3  HAI0002664    august   2
4  HAI0002664    august   3
5  HAI0002664    august   4
6  HAI0002664      july   1
7  HAI0002664      july   2
8  HAI0002664      june   1
9  HAI0002664      june   2
10 HAI0002664      june   3
11 HAI0002664       may   1
12 HAI0002664       may   2
13 HAI0002664 september   1
14 HAI0002664 september   2
15 HAI0002664 september   3
16 HAI0006637 september   1
17 HAI0007418    august   1
18 HAI0007418    august   2
19 HAI0007418      july   1
20 HAI0007418      july   2
21 HAI0007418      july   3
22 HAI0007418      july   4
23 HAI0007418      july   5
24 HAI0007418      june   1

顺便提一下,我假设在您的样本输出中,9月份的obs = 4是一种类型,因为患者标识符已经从之前的三个(2664到6637)发生了变化。