Question

我有一个数据框：

dat<- data.frame(date = c("2015-01-01","2015-01-01","2015-01-01", "2015-01-01","2015-02-02","2015-02-02","2015-02-02","2015-02-02","2015-02-02"), val= c(10,20,30,50,300,100,200,200,400), type= c("A","A","B","C","A","A","B","C","C") )
dat

       date val type
1 2015-01-01  10    A
2 2015-01-01  20    A
3 2015-01-01  30    B
4 2015-01-01  50    C
5 2015-02-02 300    A
6 2015-02-02 100    A
7 2015-02-02 200    B
8 2015-02-02 200    C
9 2015-02-02 400    C

我希望每天有一行按类型平均，所以输出为：

Date           A     B     C
2015-01-01    15     30    50
2015-02-02    200     200   300

另外我如何获得计数，结果如下：

Date           A     B     C
2015-01-01    2      1     1
2015-02-02    2      1     2

Answer 1

library(reshape2)
dcast(data = dat, formula = date ~ type, fun.aggregate = mean, value.var = "val")

#         date   A   B   C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

使用dcast，公式的LHS定义行，RHS定义列，value.var是成为值的列的名称，fun.aggregate是这些值的方式计算。默认fun.aggregate是length，即值的数量。您询问了平均值，因此我们使用mean。您还可以执行min，max，sd，IQR或任何采用向量并返回单个值的函数。

Answer 2

您也可以使用table(dat[c(1,3)]) # type #date A B C #2015-01-01 2 1 1 #2015-02-02 2 1 2来更新问题

dplyr/tidyr

对于第一个问题，我认为@Gregor的解决方案是最好的（到目前为止），library(dplyr) library(tidyr) dat %>% group_by(date,type) %>% summarise(val=mean(val)) %>% spread(type, val)的可能选项将是

base R

或nchar=50选项可以是{dcast(..和nchar=44 with(dat, tapply(val, list(date, type), FUN=mean)) # A B C #2015-01-01 15 30 50 #2015-02-02 200 200 300。所以不是很糟糕:-)）

return

Answer 3

就个人而言，我会使用reshape2来使用Gregor的解决方案。但为了完整起见，我将包括一个基础R解决方案。

agg <- with(dat, aggregate(val, by = list(date = date, type = type), FUN = mean))

out <- reshape(agg, timevar = "type", idvar = "date", direction = "wide")

out
#         date x.A x.B x.C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

如果您想删除列名称上的x.，可以使用gsub将其删除。

colnames(out) <- gsub("^x\\.", "", colnames(out))

要获取行数，请在调用FUN = mean时将FUN = length替换为aggregate。

Answer 4

使用data.table v1.9.5（当前开发），我们可以：

require(data.table) ## v1.9.5+
dcast(setDT(dat), date ~ type, fun = list(mean, length), value.var="val")
#          date A_mean_val B_mean_val C_mean_val A_length_val B_length_val C_length_val
# 1: 2015-01-01         15         30         50            2            1            1
# 2: 2015-02-02        200        200        300            2            1            2

安装说明here。

Answer 5

我将添加pivot_wider解决方案，该解决方案旨在替换早期的tidyverse选项，即

使用pivot_wider和values_fn选项，我们可以执行以下操作：

library(tidyr) # At least 1.0.0

dat %>% pivot_wider(names_from = type, values_from = val, values_fn = list(val = mean))
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 2015-01-01    15    30    50
#> 2 2015-02-02   200   200   300

和

dat %>% pivot_wider(names_from = type, values_from = val, values_fn = list(val = length))
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <int> <int> <int>
#> 1 2015-01-01     2     1     1
#> 2 2015-02-02     2     1     2

当然，如果我们想花哨的话，我们可以同时做这两项：

library(purrr)
library(rlang)

map(quos(mean, length), 
    ~pivot_wider(dat, names_from = type, values_from = val, values_fn = list(val = eval_tidy(.))))
#> [[1]]
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 2015-01-01    15    30    50
#> 2 2015-02-02   200   200   300
#> 
#> [[2]]
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <int> <int> <int>
#> 1 2015-01-01     2     1     1
#> 2 2015-02-02     2     1     2

^{由reprex package（v0.3.0）于2019-12-04创建}

请注意，如果您担心速度，请it may be worth updating to the dev version of tidyr。

在R中透视数据

5 个答案: