Dcast从融化数据汇总到长格式

时间:2018-03-28 17:58:07

标签: r aggregate melt dcast

我使用数据框的一个子集成功完成了这项工作,但我似乎无法使用其他子集。有大约4000个订单的信息,范围为0 - 8个月,情绪为0-5。

目标是使用'order'和'month.of.service'的id来融合数据,并汇总该月的平均情绪。数据框如下所示:

order | month | sentiment
123   |   0   |     3
123   |   0   |     4
123   |   1   |     3
124   |   0   |     2

我希望它看起来像这样:

123   |   0   |    3.5
123   |   1   |    3
124   |   0   |    2

这是我用过的实际代码:

sentiment.md <- melt(sentiment, id = c('Related.order', 'Lifespan'))
sentiment.dc <- dcast(sentiment.md, Related.order + Lifespan ~ value, sum)

> head(sentiment.md)
  Related.order Lifespan  variable value
1         12771        0 Sentiment     5
2         11188        1 Sentiment     3
3         12236        3 Sentiment     5
4         12925        0 Sentiment     5
5         12151        3 Sentiment     5
6         12338        0 Sentiment     5

> head(sentiment.dc)
  Related.order Lifespan   0   1   2   3   4   5
1          4976        0 NaN NaN NaN   3 NaN NaN
2          4976        1 NaN NaN NaN   3 NaN NaN
3          4976        2 NaN NaN NaN NaN   4 NaN
4          4976        3 NaN NaN NaN NaN   4 NaN
5          4976        4 NaN NaN NaN NaN   4 NaN
6          4976        5 NaN NaN NaN NaN   4 NaN

为了演示我希望它看起来更像是什么,这里使用数据框中唯一的其他列以我想要的格式完成相同的事情,交互:

interactions.md <- melt(interactions, id = c('Related.order', 'Lifespan'))
interactions.dc <- dcast(interactions.md, Related.order + Lifespan ~ value, sum)

> head(interactions.md)
  Related.order Lifespan variable value
1         12771        0    Event     1
2         11188        1    Event     1
3         12236        3    Event     1
4         12925        0    Event     1
5         12151        3    Event     1
6         12338        0    Event     1
> head(interactions.dc)
  Related.order Lifespan 1
1          4976        0 6
2          4976        1 3
3          4976        2 3
4          4976        3 1
5          4976        4 2
6          4976        5 2

我想也许我正在使用错误的结构或其他东西,但却无法识别任何东西。作为参考,这里是R-studio的截图:

enter image description here 在此先感谢您的帮助。

2 个答案:

答案 0 :(得分:4)

也许您想要进行某种聚合/折叠,而不是dcast

library(data.table);
setDT(df)[, .(sentiment = mean(sentiment)), by = .(order, month)]
#   order month  V1
#1:   123     0 3.5
#2:   123     1 3.0
#3:   124     0 2.0

如果您确实希望使用dcast,可以尝试:

dcast(df, order + month ~ ., mean, value.var = "sentiment")

dplyr

df %>% group_by(order, month) %>% summarise(sentiment = mean(sentiment))

这些只是R中聚合的众多例子中的一部分。

数据:

df <- structure(list(order = c(123L, 123L, 123L, 124L), month = c(0L, 
0L, 1L, 0L), sentiment = c(3L, 4L, 3L, 2L)), .Names = c("order", 
"month", "sentiment"), row.names = c(NA, -4L), class = "data.frame")

答案 1 :(得分:2)

使用基数R,使用aggregate

aggregate(sentiment ~ month + order, sentiment, mean, na.rm = TRUE)[c(2, 1, 3)]
#  order month sentiment
#1   123     0       3.5
#2   123     1       3.0
#3   124     0       2.0

DATA。

sentiment <- read.table(text = "
order | month | sentiment
123   |   0   |     3
123   |   0   |     4
123   |   1   |     3
124   |   0   |     2
", header = TRUE, sep = "|")