Question

这个问题建立在我之前提出的另一个问题上。鉴于以下MWE：

test <- as.data.table(data.frame(event_id = c("A","B","A","A","B"),
                                 income = c(1,2,3,4,5),
                                 location = c("PlaceX","PlaceY","PlaceX","PlaceX","PlaceY")))

test
   event_id income location
1:        A      1   PlaceX
2:        B      2   PlaceY
3:        A      3   PlaceX
4:        A      4   PlaceX
5:        B      5   PlaceY

我将如何获得：

  event_id mean_inc    loc_PlaceX    loc_PlaceY
    (fctr)   (fctr)     (numeric)     (numeric)
1        A 2.666667             3             0
2        B 3.500000             0             2

到目前为止我所拥有的：

test %>%
  group_by(event_id, location) %>%
  summarise(mean_inc = mean(income))

Source: local data table [2 x 3]
Groups: event_id

  event_id location mean_inc
    (fctr)   (fctr)    (dbl)
1        A   PlaceX 2.666667
2        B   PlaceY 3.500000

请注意我有大约10个列，我必须要分解，就像我尝试使用上面的location列一样。此外，还有数百万行。

Answer 1

由于OP显示data.table，可以使用data.table方法

完成

test[, mean_inc := mean(income), event_id]
dcast(test, event_id+mean_inc~location, value.var="income", length)
#     event_id mean_inc PlaceX PlaceY
#1:        A 2.666667      3      0
#2:        B 3.500000      0      2

R＆amp; dplyr：聚合和构建和变量

1 个答案: