R在保留订单的同时传播数据

时间:2019-06-25 16:43:45

标签: r spread

我有一个看起来像这样的数据集-

dataset = data.frame(Site=c(rep('A',3),rep('B',3),rep('C',3)),MonthYear = c(rep(c('May 19','Apr 19','Mar 19'),3)),Date=c(rep(c('2019-05-31','2019-04-30','2019-03-31'),3)),Measure=c(rep(c('Service','Speed','Efficiency'),3)),Score=runif(9,0,1))

我的目标是使用spread函数来转换该数据集。 但是,这样做之后,我希望根据Date列对传播列进行排序(升序)。

这意味着扩展列的顺序如下:Mar 19Apr 19May 19

这是我的尝试-

library(dplyr)
library(tidyr)

final = dataset %>% spread(MonthYear,Score) 

我的尝试导致跨栏按字母顺序排列。而且不是按时间顺序。

提前感谢您的输入

3 个答案:

答案 0 :(得分:1)

订购适当的因子水平即可完成

library(tidyr)

dataset = data.frame(Site=c(rep('A',3),rep('B',3),rep('C',3)),MonthYear = c(rep(c('May 19','Apr 19','Mar 19'),3)),Date=c(rep(c('2019-05-31','2019-04-30','2019-03-31'),3)),Measure=c(rep(c('Service','Speed','Efficiency'),3)),Score=runif(9,0,1))
dataset$MonthYear <- factor(dataset$MonthYear, levels = c("Mar 19", "Apr 19", "May 19"))

spread(dataset, key = MonthYear, value = Score)

  Site       Date    Measure     Mar 19    Apr 19     May 19
1    A 2019-03-31 Efficiency 0.09789678        NA         NA
2    A 2019-04-30      Speed         NA 0.4645101         NA
3    A 2019-05-31    Service         NA        NA 0.89602042
4    B 2019-03-31 Efficiency 0.59516115        NA         NA
5    B 2019-04-30      Speed         NA 0.5208239         NA
6    B 2019-05-31    Service         NA        NA 0.45334636
7    C 2019-03-31 Efficiency 0.93941294        NA         NA
8    C 2019-04-30      Speed         NA 0.5439323         NA
9    C 2019-05-31    Service         NA        NA 0.07971263

答案 1 :(得分:1)

唯一的问题是dataset$MonthYear是一个因素,并没有按照您喜欢的方式排序。

#Find Order by Date column
dLvl <- unique(dataset$MonthYear[order(dataset$Date)])
levels(dataset$MonthYear)
#[1] "Apr 19" "Mar 19" "May 19"
dataset$MonthYear <- factor(dataset$MonthYear, levels = dLvl)
levels(dataset$MonthYear)
#[1] "Mar 19" "Apr 19" "May 19"
final = dataset %>% spread(MonthYear,Score) 
final
# Site       Date    Measure    Mar 19    Apr 19    May 19
#1    A 2019-03-31 Efficiency 0.9928678        NA        NA
#2    A 2019-04-30      Speed        NA 0.1457551        NA
#3    A 2019-05-31    Service        NA        NA 0.6047312
#4    B 2019-03-31 Efficiency 0.4419907        NA        NA
#5    B 2019-04-30      Speed        NA 0.5799068        NA

答案 2 :(得分:1)

如果将它们转换为日期,则可以根据这些日期的顺序对列进行排序

df <- 
  dataset %>% 
    spread(MonthYear,Score)

col_dts <- as.Date(paste0('01', names(df)), format = '%d%b %y')
df <- df[order(!is.na(col_dts), col_dts)]

df    
#   Site       Date    Measure     Mar 19    Apr 19    May 19
# 1    A 2019-03-31 Efficiency 0.76653679        NA        NA
# 2    A 2019-04-30      Speed         NA 0.0416291        NA
# 3    A 2019-05-31    Service         NA        NA 0.3885358
# 4    B 2019-03-31 Efficiency 0.02538343        NA        NA
# 5    B 2019-04-30      Speed         NA 0.7264234        NA
# 6    B 2019-05-31    Service         NA        NA 0.5128166
# 7    C 2019-03-31 Efficiency 0.50107038        NA        NA
# 8    C 2019-04-30      Speed         NA 0.9013112        NA
# 9    C 2019-05-31    Service         NA        NA 0.3678922

或者您可以根据日期值的顺序更改因子水平

new_levels <- 
  with(dataset, {
        mons <- unique(MonthYear)
        ord <- order(as.Date(paste0('01', mons), format = '%d%b %y'))
        mons[ord]})

dataset$MonthYear <- factor(dataset$MonthYear, levels = new_levels)

dataset %>% 
  spread(MonthYear,Score)

#   Site       Date    Measure     Mar 19    Apr 19    May 19
# 1    A 2019-03-31 Efficiency 0.76653679        NA        NA
# 2    A 2019-04-30      Speed         NA 0.0416291        NA
# 3    A 2019-05-31    Service         NA        NA 0.3885358
# 4    B 2019-03-31 Efficiency 0.02538343        NA        NA
# 5    B 2019-04-30      Speed         NA 0.7264234        NA
# 6    B 2019-05-31    Service         NA        NA 0.5128166
# 7    C 2019-03-31 Efficiency 0.50107038        NA        NA
# 8    C 2019-04-30      Speed         NA 0.9013112        NA
# 9    C 2019-05-31    Service         NA        NA 0.3678922

您还可以将reorderdcast一起使用(不确定为什么它不能与点差一起使用)

library(data.table)

dataset %>% 
  dcast(Site + Date + Measure ~ reorder(MonthYear, -order(Date)), 
        value.var = 'Score')

#   Site       Date    Measure     Mar 19    Apr 19    May 19
# 1    A 2019-03-31 Efficiency 0.76653679        NA        NA
# 2    A 2019-04-30      Speed         NA 0.0416291        NA
# 3    A 2019-05-31    Service         NA        NA 0.3885358
# 4    B 2019-03-31 Efficiency 0.02538343        NA        NA
# 5    B 2019-04-30      Speed         NA 0.7264234        NA
# 6    B 2019-05-31    Service         NA        NA 0.5128166
# 7    C 2019-03-31 Efficiency 0.50107038        NA        NA
# 8    C 2019-04-30      Speed         NA 0.9013112        NA
# 9    C 2019-05-31    Service         NA        NA 0.3678922