我想在下面的数据框中计算每个人(hai_dispense_number)每月(hai_dispense_number)的行数。我的总体目标是看看4月到9月的平均行数是否增加。我很确定我应该使用ave函数来创建一个count变量。但我所有的尝试都不适合我。见下面的尝试。一旦我完成了计数,我想我将能够使用ddply每月进行一次平均摘要。下面是一个玩具df,列'obs'是我想要的输出。
df
hai_dispense_number date_of_claim hai_atc month obs
9972511 Patient HAI0002664 2010-04-07 A10BA02 april 1
11376245 Patient HAI0002664 2010-05-04 A10BA02 may 1
12508505 Patient HAI0002664 2010-05-31 A10BA02 may 2
13480611 Patient HAI0002664 2010-06-30 A10BA02 june 1
13486327 Patient HAI0002664 2010-06-30 A10BH03 june 2
13567944 Patient HAI0002664 2010-06-08 A10BA02 june 3
15003657 Patient HAI0002664 2010-07-27 A10BA02 july 1
15003658 Patient HAI0002664 2010-07-27 A10BH03 july 2
16600413 Patient HAI0002664 2010-08-31 A10BB09 august 1
16600866 Patient HAI0002664 2010-08-23 A10BA02 august 2
16600867 Patient HAI0002664 2010-08-23 A10BH03 august 3
17537505 Patient HAI0002664 2010-08-27 A10BB09 august 4
19176349 Patient HAI0002664 2010-09-17 A10BB09 september 1
19176350 Patient HAI0002664 2010-09-17 A10BH03 september 2
19176358 Patient HAI0002664 2010-09-17 A10BA02 september 3
17765433 Patient HAI0006637 2010-09-17 A10BA02 september 4
12953451 Patient HAI0007418 2010-06-04 A10BA02 june 1
15786889 Patient HAI0007418 2010-07-28 A10BB09 july 1
15787103 Patient HAI0007418 2010-07-12 A10BB09 july 2
15787233 Patient HAI0007418 2010-07-05 A10BA02 july 3
15878776 Patient HAI0007418 2010-07-08 A10BB09 july 4
15908690 Patient HAI0007418 2010-07-23 A10BB09 july 5
17363576 Patient HAI0007418 2010-08-20 A10BB09 august 1
17554737 Patient HAI0007418 2010-08-13 A10BB09 august 2
之前的尝试
df$obs<-with(df, ave(month, hai_dispense_number, FUN=seq_along)) ##doesn't split by month
df$obs<-with(df, ave(month, hai_dispense_number, FUN=cumsum)) ##gives all NA values, think seq_along is actually what I want
df$obs <- ave(df$month, df$month, FUN=seq_along) ##this is better than the previous two, but doesn't seem to split by person
ddply(df,~month,summarise,mean=mean(obs)) ##this works absolutely fine, just need to counts right first!
会重视任何人可以提供的任何输入。好像我在这里遇到了一些根本性的错误。
答案 0 :(得分:2)
好的,我已将您的数据减少到:
> head(df)
patient month
9972511 HAI0002664 april
11376245 HAI0002664 may
12508505 HAI0002664 may
13480611 HAI0002664 june
13486327 HAI0002664 june
13567944 HAI0002664 june
这就是我们所需要的,因为我们只使用患者标识符和数月。要获取所需的新列,请尝试以下操作:
library(plyr)
> ddply(df, .(patient, month), mutate, obs = 1:length(month))
patient month obs
1 HAI0002664 april 1
2 HAI0002664 august 1
3 HAI0002664 august 2
4 HAI0002664 august 3
5 HAI0002664 august 4
6 HAI0002664 july 1
7 HAI0002664 july 2
8 HAI0002664 june 1
9 HAI0002664 june 2
10 HAI0002664 june 3
11 HAI0002664 may 1
12 HAI0002664 may 2
13 HAI0002664 september 1
14 HAI0002664 september 2
15 HAI0002664 september 3
16 HAI0006637 september 1
17 HAI0007418 august 1
18 HAI0007418 august 2
19 HAI0007418 july 1
20 HAI0007418 july 2
21 HAI0007418 july 3
22 HAI0007418 july 4
23 HAI0007418 july 5
24 HAI0007418 june 1
顺便提一下,我假设在您的样本输出中,9月份的obs = 4是一种类型,因为患者标识符已经从之前的三个(2664到6637)发生了变化。