如何将月份转换为因子,同时仍按顺序保持月份?

时间:2018-02-22 05:22:56

标签: r lubridate

我有一个原始数据框(df),其中包含大约10年(1994-2003)的数据。头部(df)如下所示:

Sl.no       Date Year Month Season            val1            val2     val3
1     1 1993-12-01 1993   Dec Winter          21.0            16.0      3.0
2     2 1994-01-01 1994   Jan Winter          21.0            15.5      0.0
3     3 1994-02-01 1994   Feb Winter          21.0            18.5      0.0
4     4 1994-03-01 1994   Mar Spring          30.0            24.0      1.9
5     5 1994-04-01 1994   Apr Spring          35.5            27.0      0.5
6     6 1994-05-01 1994   May Spring          36.0            30.0      1.5

因为我想将Months转换为因子,所以为了绘制boxplot,我使用了:

df$Month <- as.factor(format(df$Date, "%b"))
levels(df$Month) <- c("Jan","Feb","Mar", "Apr", "May", "Jun", "Jul",
"Aug", "Sep", "Oct", "Nov", "Dec")

然而输出如下:(月份不像原始df那样顺序)

Sl.no       Date Year Month Season          val1             val2      val3
1     1 1993-12-01 1993   Mar Winter          21.0            16.0      3.0
2     2 1994-01-01 1994   May Winter          21.0            15.5      0.0
3     3 1994-02-01 1994   Apr Winter          21.0            18.5      0.0
4     4 1994-03-01 1994   Aug Spring          30.0            24.0      1.9
5     5 1994-04-01 1994   Jan Spring          35.5            27.0      0.5
6     6 1994-05-01 1994   Sep Spring          36.0            30.0      1.5

所以在上面的df中,注意到月份是扭曲的,否则应该在日期之后按顺序。

那我怎么能纠正这个问题呢?我们将非常感谢您的帮助。 亲切的问候

1 个答案:

答案 0 :(得分:0)

使用

df$Month <- factor(format(df$Date, "%b"), month.abb, ordered = TRUE)

演示您遇到的问题:

set.seed(1)
M <- sample(month.abb, 20, TRUE)
M
#  [1] "Apr" "May" "Jul" "Nov" "Mar" "Nov" "Dec" "Aug" "Aug" "Jan" "Mar" "Mar" "Sep" "May"
# [15] "Oct" "Jun" "Sep" "Dec" "May" "Oct"

your_attempt <- as.factor(M)
#  [1] Apr May Jul Nov Mar Nov Dec Aug Aug Jan Mar Mar Sep May Oct Jun Sep Dec May Oct
# Levels: Apr Aug Dec Jan Jul Jun Mar May Nov Oct Sep

## At this step, you're basically asking R to replace "Apr" with "Jan",
##   "Aug" with "Feb", and so on. Not what you're looking for....
levels(your_attempt) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", 
                          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

your_attempt
#  [1] Jan Aug May Sep Jul Sep Mar Feb Feb Apr Jul Jul Nov Aug Oct Jun Nov Mar Aug Oct
# Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

## ordered = TRUE not necessarily required. Depends on what you want to do
new_attempt <- factor(M, levels = month.abb, ordered = TRUE)
new_attempt
#  [1] Apr May Jul Nov Mar Nov Dec Aug Aug Jan Mar Mar Sep May Oct Jun Sep Dec May Oct
# Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec