重塑但扩展R中的数据

时间:2018-10-11 20:42:37

标签: r

我有以下数据集:

my.data <- read.table(text = '
                  ID  tmc_code  wDay    time_category   TTTR
                  1  121-04711  weekday Afternoon   1.1
                  2  121-04711  weekend Evening     1.3
                  3  121-04711  weekday Morning 1.1
                  4  121-04712  weekend Afternoon   1.101626016
                  5  121-04712  weekday Evening 1.281124498
                  6  121-04712  weekday Morning 1.080645161
                  ', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
my.data

,我希望得到这样的宽格式结果:

#result
  #          tmc_code    wDay    TTTR_afternnon TTTR_Evening  TTTR_Morning
  #          121-04711  weekday         1.1         1.3           NA
  #          121-04711  weekend         NA          NA            1.1
  #          121-04712  weekday         NA       1.281124498    1.080645161
  #          121-04712  weekend    1.101626016      NA            NA

我们可以看到不仅要使用重塑功能,而且实际上此过程会将6个数据转换为9个数据。

以下重塑功能不适用于这种情况:

w.my.data <- reshape(my.data, idvar = "tmc_code", timevar = "time_category", direction = "wide")

我想知道有人有更好的主意吗?非常感谢!

2 个答案:

答案 0 :(得分:5)

您可以使用reshape2软件包:

> reshape2::dcast(my.data, tmc_code + wDay ~ paste("TTTR", time_category, sep="_"))

Using TTTR as value column: use value.var to override.
   tmc_code    wDay TTTR_Afternoon TTTR_Evening TTTR_Morning
1 121-04711 weekday       1.100000           NA     1.100000
2 121-04711 weekend             NA     1.300000           NA
3 121-04712 weekday             NA     1.281124     1.080645
4 121-04712 weekend       1.101626           NA           NA

哦,显然它也可以与reshape一起使用,这也为在此处被忽略的ID的变化提供了有用的警告:

> reshape(my.data, idvar = c("tmc_code", "wDay"), timevar = "time_category", v.names = "TTTR", direction = "wide")

   ID  tmc_code    wDay TTTR.Afternoon TTTR.Evening TTTR.Morning
1:  1 121-04711 weekday       1.100000           NA     1.100000
2:  2 121-04711 weekend             NA     1.300000           NA
3:  4 121-04712 weekend       1.101626           NA           NA
4:  5 121-04712 weekday             NA     1.281124     1.080645
Warning message:
In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying,  :
  some constant variables (ID) are really varying

答案 1 :(得分:5)

类似于@Frank的答案,但使用tidyr::spread

library(tidyverse)

my.data %>% 
  select(-ID) %>% # Be sure no important info is lost/misrepresented in dropping ID
  mutate(time_category = paste0("TTTR", "_", time_category)) %>%
  spread(time_category, TTTR)

   tmc_code    wDay TTTR_Afternoon TTTR_Evening TTTR_Morning
1 121-04711 weekday       1.100000           NA     1.100000
2 121-04711 weekend             NA     1.300000           NA
3 121-04712 weekday             NA     1.281124     1.080645
4 121-04712 weekend       1.101626           NA           NA