Question

我需要重新排列数据框，目前看起来像这样：

> counts
       year     score   freq rounded_year
    1: 1618         0     25         1620
    2: 1619         2      1         1620
    3: 1619         0     20         1620
    4: 1620         1      6         1620
    5: 1620         0     70         1620
   ---                                   
11570: 1994       107      1         1990
11571: 1994       101      2         1990
11572: 1994        10    194         1990
11573: 1994         1  30736         1990
11574: 1994         0 711064         1990

但我需要的是每十年score中唯一值的计数（rounded_year）。因此，数据框应如下所示：

rounded_year  0       1      2   3  [...] total
1620          115     6      1   0        122
---
1990          711064  30736  0   0        741997

我玩过aggregate和ddply，但到目前为止还没有成功。我希望，我的意思很清楚。我不知道如何更好地描述它。

有什么想法吗？

Answer 1

使用dplyr和tidyr的简单示例。

dt = data.frame(year = c(1618,1619,1620,1994,1994,1994),
                score = c(0,1,0,2,2,3),
                freq = c(3,5,2,6,7,8),
                rounded_year = c(1620,1620,1620,1990,1990,1990))

dt

#    year score freq rounded_year
# 1 1618     0    3         1620
# 2 1619     1    5         1620
# 3 1620     0    2         1620
# 4 1994     2    6         1990
# 5 1994     2    7         1990
# 6 1994     3    8         1990


library(dplyr)
library(tidyr)

dt %>%
  group_by(rounded_year, score) %>%
  summarise(freq = sum(freq)) %>%
  mutate(total = sum(freq)) %>%
  spread(score,freq, fill=0) 


# Source: local data frame [2 x 6]
# 
#    rounded_year total     0     1     2     3
#           (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1         1620    10     5     5     0     0
# 2         1990    21     0     0    13     8

如果您更喜欢使用data.table（因为您提供的数据集看起来更像是data.table），您可以使用：

library(data.table)
library(tidyr)

dt = setDT(dt)[, .(freq = sum(freq)) ,by=c("rounded_year","score")]
dt = dt[, total:= sum(freq) ,by="rounded_year"]
dt = spread(dt,score,freq, fill=0)
dt

#    rounded_year total 0 1  2 3
# 1:         1620    10 5 5  0 0
# 2:         1990    21 0 0 13 8

使用汇总值重新排列R中的数据框

1 个答案: