保留重复的条目,我在dplyr中使用group_by()

时间:2017-11-28 16:06:33

标签: r dplyr data-manipulation

library(dplyr) ##activates the data.table library

mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
                                                  "14/11/2016",
                                                  "14/11/2016",
                                                  "02/01/2017",
                                                  "02/01/2017",
                                                  "15/11/2017",
                                                  "15/11/2017",
                                                  "16/11/2017",
                                                  "16/11/2017"),
                                        week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
                                        satisfactionLevel = c("Very dissatisfied",
                                                              "Very satisfied",
                                                              "Satisfied",
                                                              "Dissatisfied",
                                                              "Very dissatisfied",
                                                              "Very satisfied",
                                                              "Very dissatisfied",
                                                              "Very Satisfied",
                                                              "Very satisfied"),
                                        weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))

当我调用以下函数pivotTable <- mydataWithWeeksAndWeights %>% group_by(week, weight) %>% count(satisfactionLevel)时,它会计算所有第46周条目的satisflevel。问题是前三行的第46周是指2016年,剩下的是指2017年。我想保留这些重复的条目。

2 个答案:

答案 0 :(得分:2)

我无法确定我的代码是否符合您的要求,因为您没有提供预期的输出,但我认为您需要做的是添加year列并将其添加到{{ 1}}以便您区分2016年第46周和2017年第46周。

编辑:如果您需要从结束日期自动定义年份,我在@ docendodiscimus的评论中添加一点:

group_by

答案 1 :(得分:0)

以下是我要做的事情:将“结束”重新格式化为日期格式并使用聚合函数:

# just to shorten df-name
df <- mydataWithWeeksAndWeights 

# reformat and add column with year
df[,"ended"] <- as.Date(df[[1]], format = "%d/%m/%Y")
df$year <- format(df[[1]], "%Y")

# actual aggregating
aggregate (df$weight, by = list(df$year, df$satisfactionLevel, df$week), FUN = sum)

希望这有帮助!