在R中透视数据

时间:2015-05-12 17:55:26

标签: r

我有一个数据框:

dat<- data.frame(date = c("2015-01-01","2015-01-01","2015-01-01", "2015-01-01","2015-02-02","2015-02-02","2015-02-02","2015-02-02","2015-02-02"), val= c(10,20,30,50,300,100,200,200,400), type= c("A","A","B","C","A","A","B","C","C") )
dat

       date val type
1 2015-01-01  10    A
2 2015-01-01  20    A
3 2015-01-01  30    B
4 2015-01-01  50    C
5 2015-02-02 300    A
6 2015-02-02 100    A
7 2015-02-02 200    B
8 2015-02-02 200    C
9 2015-02-02 400    C

我希望每天有一行按类型平均,所以输出为:

Date           A     B     C
2015-01-01    15     30    50
2015-02-02    200     200   300

另外我如何获得计数,结果如下:

Date           A     B     C
2015-01-01    2      1     1
2015-02-02    2      1     2

5 个答案:

答案 0 :(得分:5)

library(reshape2)
dcast(data = dat, formula = date ~ type, fun.aggregate = mean, value.var = "val")

#         date   A   B   C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

使用dcast,公式的LHS定义行,RHS定义列,value.var是成为值的列的名称,fun.aggregate是这些值的方式计算。默认fun.aggregatelength,即值的数量。您询问了平均值,因此我们使用mean。您还可以执行minmaxsdIQR或任何采用向量并返回单个值的函数。

答案 1 :(得分:4)

您也可以使用 table(dat[c(1,3)]) # type #date A B C #2015-01-01 2 1 1 #2015-02-02 2 1 2 来更新问题

dplyr/tidyr

对于第一个问题,我认为@Gregor的解决方案是最好的(到目前为止), library(dplyr) library(tidyr) dat %>% group_by(date,type) %>% summarise(val=mean(val)) %>% spread(type, val) 的可能选项将是

base R

nchar=50选项可以是{dcast(..nchar=44 with(dat, tapply(val, list(date, type), FUN=mean)) # A B C #2015-01-01 15 30 50 #2015-02-02 200 200 300 。所以不是很糟糕:-))

return

答案 2 :(得分:3)

就个人而言,我会使用reshape2来使用Gregor的解决方案。但为了完整起见,我将包括一个基础R解决方案。

agg <- with(dat, aggregate(val, by = list(date = date, type = type), FUN = mean))

out <- reshape(agg, timevar = "type", idvar = "date", direction = "wide")

out
#         date x.A x.B x.C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

如果您想删除列名称上的x.,可以使用gsub将其删除。

colnames(out) <- gsub("^x\\.", "", colnames(out))

要获取行数,请在调用FUN = mean时将FUN = length替换为aggregate

答案 3 :(得分:1)

使用data.table v1.9.5(当前开发),我们可以:

require(data.table) ## v1.9.5+
dcast(setDT(dat), date ~ type, fun = list(mean, length), value.var="val")
#          date A_mean_val B_mean_val C_mean_val A_length_val B_length_val C_length_val
# 1: 2015-01-01         15         30         50            2            1            1
# 2: 2015-02-02        200        200        300            2            1            2

安装说明here

答案 4 :(得分:0)

我将添加pivot_wider解决方案,该解决方案旨在替换早期的tidyverse选项,即

使用pivot_widervalues_fn选项,我们可以执行以下操作:

library(tidyr) # At least 1.0.0

dat %>% pivot_wider(names_from = type, values_from = val, values_fn = list(val = mean))
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 2015-01-01    15    30    50
#> 2 2015-02-02   200   200   300

dat %>% pivot_wider(names_from = type, values_from = val, values_fn = list(val = length))
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <int> <int> <int>
#> 1 2015-01-01     2     1     1
#> 2 2015-02-02     2     1     2

当然,如果我们想花哨的话,我们可以同时做这两项:

library(purrr)
library(rlang)

map(quos(mean, length), 
    ~pivot_wider(dat, names_from = type, values_from = val, values_fn = list(val = eval_tidy(.))))
#> [[1]]
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 2015-01-01    15    30    50
#> 2 2015-02-02   200   200   300
#> 
#> [[2]]
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <int> <int> <int>
#> 1 2015-01-01     2     1     1
#> 2 2015-02-02     2     1     2

reprex package(v0.3.0)于2019-12-04创建

请注意,如果您担心速度,请it may be worth updating to the dev version of tidyr