我有以下data.frame,如下所示:
head(entries,10)
Provider.Region year.start month.start day.start Provider.Status
23511 North West 0010 05 17 Deregistered (V)
23512 North West 0010 05 17 Deregistered (V)
23709 West Midlands 0010 06 01 Registered
23562 London 0010 06 10 Registered
23563 London 0010 06 10 Registered
23566 London 0010 06 10 Registered
23764 West Midlands 0010 06 10 Deregistered (V)
23508 London 0010 06 11 Deregistered (V)
23555 West Midlands 0010 06 11 Registered
23497 South East 0010 06 14 Deregistered (V)
我想按月计算与Provider.Status
对应的因子水平。我想要的输出应该是这样的:
head(entries.1, 3)
time region Deregistered (V) Registered
5-0010 North West 2 0
6-0010 West Midlands 2 1
6-0010 London 1 3
目前我一直在使用dplyr
,如下所示
library(dplyr)
entries %>%
group_by(Provider.Region, year.start, month.start) %>%
mutate(counts_status = n())
但仍然没有产生我预期的输出,因为它给出了类似的东西:
Source: local data frame [23,775 x 6]
Groups: Provider.Region, year.start, month.start [606]
Provider.Region year.start month.start Provider.Status counts_status
(fctr) (fctr) (fctr) (fctr) (int)
1 North West 0010 05 Deregistered (V) 2
2 North West 0010 05 Deregistered (V) 2
3 West Midlands 0010 06 Registered 4
4 London 0010 06 Registered 7
5 London 0010 06 Registered 7
6 London 0010 06 Registered 7
7 West Midlands 0010 06 Deregistered (V) 4
8 London 0010 06 Deregistered (V) 7
9 West Midlands 0010 06 Registered 4
10 South East 0010 06 Deregistered (V) 10
.. ... ... ... ... ...
有没有可以从计数中创建变量的紧凑方式?非常感谢提前
答案 0 :(得分:2)
这可以使用 reshape2 或 data.table 包中的dcast
函数来实现:
library(reshape2)
dcast(mydf, paste(year.start,month.start,sep="-") + Provider.Region ~ Provider.Status)
library(data.table)
dcast(setDT(mydf), paste(year.start,month.start,sep="-") + Provider.Region ~ Provider.Status)
最后一个的输出:
year.start Provider.Region Deregistered(V) Registered
1: 0010-05 NorthWest 2 0
2: 0010-06 London 1 3
3: 0010-06 SouthEast 1 0
4: 0010-06 WestMidlands 1 2
使用上述代码时,您会收到一条警告消息:
Using 'Provider.Status' as value column. Use 'value.var' to override
Aggregate function missing, defaulting to 'length'
这没有任何意义,但是为了防止您可以指定value.var
和聚合函数:
dcast(setDT(mydf),
paste(year.start,month.start,sep="-") + Provider.Region ~ Provider.Status,
value.var = "Provider.Status", fun.aggregate = length)
答案 1 :(得分:1)
您可以使用reshape2包来生成这样的表格:
library(reshape2)
d <- data.frame(region=rep(c("A", "B", "C"), each=2), timepoint = c(1, 1, 1, 1, 2, 2), provider=rep(c("D", "R"), 3), count_status = 1:6)
dcast(d, region + timepoint ~ provider, value.var = "count_status")
获得此输出:
region timepoint D R
1 A 1 1 2
2 B 1 3 4
3 C 2 5 6