想象一下数据框(这是一个说明性的样本)
s <- c("January", "February", "March", "January", "March", "April")
t <- c(5, 3, 2, 3, 3, 7)
df1 <- as.data.frame(s)
df1[ , 2] <- t
现在出于图形目的,我希望按月整合。如果我编写代码然后总结:
library(dplyr)
df1$s <- factor(df1$s, levels = month.name)
summary <- df1 %>% group_by(a) %>% summarize(Sales = sum(V2))
输出正确但无序:
April 7
February 3
January 8
March 5
但是,如果我执行以下操作:
df1$s <- as.factor(df1$s)
levels(df1$s) <- c("January", "February", "March", "April")
Summary <- df1 %>% group_by(s) %>% summarize(Sales = sum(V2))
输出为:
January 7
February 3
March 8
April 5
总和是错的,但订单是正确的。为什么会这样?
它按字母顺序按月组织,然后在不更改其他值的情况下调整Month列。
答案 0 :(得分:2)
如果您想要重新考虑因素,可以使用forcats
包并操纵因子顺序。正如您在本文末尾所看到的,您的因子顺序不是月份顺序。因此,我使用fct_relevel()
来更改级别并进行计算。
library(dplyr)
library(forcats)
df1 %>%
mutate(s = fct_relevel(s, month.name[1:4])) %>%
group_by(s) %>%
summarise(Sales = sum(V2)) -> out
out
# s Sales
# <fctr> <dbl>
#1 January 8
#2 February 3
#3 March 5
#4 April 7
# Check level order
#levels(out$s)
#[1] "January" "February" "March" "April"
#levels(df1$s)
#[1] "April" "February" "January" "March"