R中的错误,以月为级别。这是一个错误或逻辑缺陷吗?

时间:2016-12-16 21:13:20

标签: r dplyr levels

想象一下数据框(这是一个说明性的样本)

s <- c("January", "February", "March", "January", "March", "April")
t <- c(5, 3, 2, 3, 3, 7)
df1 <- as.data.frame(s)
df1[ , 2] <- t

现在出于图形目的,我希望按月整合。如果我编写代码然后总结:

 library(dplyr)
 df1$s <- factor(df1$s, levels = month.name)
 summary <- df1 %>% group_by(a) %>% summarize(Sales = sum(V2))

输出正确但无序:

April     7
February  3
January   8
March     5

但是,如果我执行以下操作:

df1$s <- as.factor(df1$s)
levels(df1$s) <- c("January", "February", "March", "April")
Summary <- df1 %>% group_by(s) %>% summarize(Sales = sum(V2))

输出为:

January    7
February   3
March      8
April      5

总和是错的,但订单是正确的。为什么会这样?

它按字母顺序按月组织,然后在不更改其他值的情况下调整Month列。

1 个答案:

答案 0 :(得分:2)

如果您想要重新考虑因素,可以使用forcats包并操纵因子顺序。正如您在本文末尾所看到的,您的因子顺序不是月份顺序。因此,我使用fct_relevel()来更改级别并进行计算。

library(dplyr)
library(forcats)

df1 %>%
mutate(s = fct_relevel(s, month.name[1:4])) %>%
group_by(s) %>%
summarise(Sales = sum(V2)) -> out

out

#             s Sales
#    <fctr> <dbl>
#1  January     8
#2 February     3
#3    March     5
#4    April     7

# Check level order

#levels(out$s)
#[1] "January"  "February" "March"    "April"

#levels(df1$s)
#[1] "April"    "February" "January"  "March"