我有一个看起来像这样的数据集-
dataset = data.frame(Site=c(rep('A',3),rep('B',3),rep('C',3)),MonthYear = c(rep(c('May 19','Apr 19','Mar 19'),3)),Date=c(rep(c('2019-05-31','2019-04-30','2019-03-31'),3)),Measure=c(rep(c('Service','Speed','Efficiency'),3)),Score=runif(9,0,1))
我的目标是使用spread
函数来转换该数据集。
但是,这样做之后,我希望根据Date
列对传播列进行排序(升序)。
这意味着扩展列的顺序如下:Mar 19
,Apr 19
,May 19
这是我的尝试-
library(dplyr)
library(tidyr)
final = dataset %>% spread(MonthYear,Score)
我的尝试导致跨栏按字母顺序排列。而且不是按时间顺序。
提前感谢您的输入
答案 0 :(得分:1)
订购适当的因子水平即可完成
。library(tidyr)
dataset = data.frame(Site=c(rep('A',3),rep('B',3),rep('C',3)),MonthYear = c(rep(c('May 19','Apr 19','Mar 19'),3)),Date=c(rep(c('2019-05-31','2019-04-30','2019-03-31'),3)),Measure=c(rep(c('Service','Speed','Efficiency'),3)),Score=runif(9,0,1))
dataset$MonthYear <- factor(dataset$MonthYear, levels = c("Mar 19", "Apr 19", "May 19"))
spread(dataset, key = MonthYear, value = Score)
Site Date Measure Mar 19 Apr 19 May 19
1 A 2019-03-31 Efficiency 0.09789678 NA NA
2 A 2019-04-30 Speed NA 0.4645101 NA
3 A 2019-05-31 Service NA NA 0.89602042
4 B 2019-03-31 Efficiency 0.59516115 NA NA
5 B 2019-04-30 Speed NA 0.5208239 NA
6 B 2019-05-31 Service NA NA 0.45334636
7 C 2019-03-31 Efficiency 0.93941294 NA NA
8 C 2019-04-30 Speed NA 0.5439323 NA
9 C 2019-05-31 Service NA NA 0.07971263
答案 1 :(得分:1)
唯一的问题是dataset$MonthYear
是一个因素,并没有按照您喜欢的方式排序。
#Find Order by Date column
dLvl <- unique(dataset$MonthYear[order(dataset$Date)])
levels(dataset$MonthYear)
#[1] "Apr 19" "Mar 19" "May 19"
dataset$MonthYear <- factor(dataset$MonthYear, levels = dLvl)
levels(dataset$MonthYear)
#[1] "Mar 19" "Apr 19" "May 19"
final = dataset %>% spread(MonthYear,Score)
final
# Site Date Measure Mar 19 Apr 19 May 19
#1 A 2019-03-31 Efficiency 0.9928678 NA NA
#2 A 2019-04-30 Speed NA 0.1457551 NA
#3 A 2019-05-31 Service NA NA 0.6047312
#4 B 2019-03-31 Efficiency 0.4419907 NA NA
#5 B 2019-04-30 Speed NA 0.5799068 NA
答案 2 :(得分:1)
如果将它们转换为日期,则可以根据这些日期的顺序对列进行排序
df <-
dataset %>%
spread(MonthYear,Score)
col_dts <- as.Date(paste0('01', names(df)), format = '%d%b %y')
df <- df[order(!is.na(col_dts), col_dts)]
df
# Site Date Measure Mar 19 Apr 19 May 19
# 1 A 2019-03-31 Efficiency 0.76653679 NA NA
# 2 A 2019-04-30 Speed NA 0.0416291 NA
# 3 A 2019-05-31 Service NA NA 0.3885358
# 4 B 2019-03-31 Efficiency 0.02538343 NA NA
# 5 B 2019-04-30 Speed NA 0.7264234 NA
# 6 B 2019-05-31 Service NA NA 0.5128166
# 7 C 2019-03-31 Efficiency 0.50107038 NA NA
# 8 C 2019-04-30 Speed NA 0.9013112 NA
# 9 C 2019-05-31 Service NA NA 0.3678922
或者您可以根据日期值的顺序更改因子水平
new_levels <-
with(dataset, {
mons <- unique(MonthYear)
ord <- order(as.Date(paste0('01', mons), format = '%d%b %y'))
mons[ord]})
dataset$MonthYear <- factor(dataset$MonthYear, levels = new_levels)
dataset %>%
spread(MonthYear,Score)
# Site Date Measure Mar 19 Apr 19 May 19
# 1 A 2019-03-31 Efficiency 0.76653679 NA NA
# 2 A 2019-04-30 Speed NA 0.0416291 NA
# 3 A 2019-05-31 Service NA NA 0.3885358
# 4 B 2019-03-31 Efficiency 0.02538343 NA NA
# 5 B 2019-04-30 Speed NA 0.7264234 NA
# 6 B 2019-05-31 Service NA NA 0.5128166
# 7 C 2019-03-31 Efficiency 0.50107038 NA NA
# 8 C 2019-04-30 Speed NA 0.9013112 NA
# 9 C 2019-05-31 Service NA NA 0.3678922
您还可以将reorder
与dcast
一起使用(不确定为什么它不能与点差一起使用)
library(data.table)
dataset %>%
dcast(Site + Date + Measure ~ reorder(MonthYear, -order(Date)),
value.var = 'Score')
# Site Date Measure Mar 19 Apr 19 May 19
# 1 A 2019-03-31 Efficiency 0.76653679 NA NA
# 2 A 2019-04-30 Speed NA 0.0416291 NA
# 3 A 2019-05-31 Service NA NA 0.3885358
# 4 B 2019-03-31 Efficiency 0.02538343 NA NA
# 5 B 2019-04-30 Speed NA 0.7264234 NA
# 6 B 2019-05-31 Service NA NA 0.5128166
# 7 C 2019-03-31 Efficiency 0.50107038 NA NA
# 8 C 2019-04-30 Speed NA 0.9013112 NA
# 9 C 2019-05-31 Service NA NA 0.3678922