通过收集多个列来整理数据集?

时间:2017-11-13 19:48:50

标签: r tidyr

我想通过这种方式操作数据来整理数据集:

age gender  education       previous_comp_exp   tutorial_time   qID.1    time_taken.1   qID.2    time_taken.2   
18  Male    Undergraduate   casual gamer        62.17926        sor9     39.61206       sor8     19.4892
24  Male    Undergraduate   casual gamer        85.01288        sor9     50.92343       sor8     16.15616

成为这个:

age gender  education       previous_comp_exp   tutorial_time   qID      time_taken 
18  Male    Undergraduate   casual gamer        62.17926        sor9     39.61206       
18  Male    Undergraduate   casual gamer        62.17926        sor8     19.4892
24  Male    Undergraduate   casual gamer        85.01288        sor9     50.92343       
24  Male    Undergraduate   casual gamer        85.01288        sor8     16.15616

我已尝试使用gather(),但我只能使用一列来处理此问题,并且我一直收到此警告:

  

警告消息:测量变量的属性不相同;   他们将被放弃

有什么想法吗?

2 个答案:

答案 0 :(得分:11)

来自melt的{​​{1}}(请参阅data.table):

?patterns

<强>结果:

library(data.table)

melt(setDT(df), measure = patterns("^qID", "^time_taken"),
     value.name = c("qID", "time_taken"))

age gender education previous_comp_exp tutorial_time variable qID time_taken 1: 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206 2: 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343 3: 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920 4: 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616

tidyr

<强>结果:

library(dplyr)
library(tidyr)

df %>%
  gather(variable, value, qID.1:time_taken.2) %>%
  mutate(variable = sub("\\.\\d$", "", variable)) %>%
  group_by(variable) %>%
  mutate(ID = row_number()) %>%
  spread(variable, value, convert = TRUE) %>%
  select(-ID)

注意:

对于# A tibble: 4 x 7 age gender education previous_comp_exp tutorial_time qID time_taken <int> <fctr> <fctr> <fctr> <dbl> <chr> <dbl> 1 18 Male Undergraduate casual_gamer 62.17926 sor9 39.61206 2 18 Male Undergraduate casual_gamer 62.17926 sor8 19.48920 3 24 Male Undergraduate casual_gamer 85.01288 sor9 50.92343 4 24 Male Undergraduate casual_gamer 85.01288 sor8 16.15616 方法,tidyr用于将convert=TRUE转换回time_taken,因为numericgather时强制转换为qID {1}}列。

数据:

df = structure(list(age = c(18L, 24L), gender = structure(c(1L, 1L
), .Label = "Male", class = "factor"), education = structure(c(1L, 
1L), .Label = "Undergraduate", class = "factor"), previous_comp_exp = structure(c(1L, 
1L), .Label = "casual_gamer", class = "factor"), tutorial_time = c(62.17926, 
85.01288), qID.1 = structure(c(1L, 1L), .Label = "sor9", class = "factor"), 
    time_taken.1 = c(39.61206, 50.92343), qID.2 = structure(c(1L, 
    1L), .Label = "sor8", class = "factor"), time_taken.2 = c(19.4892, 
    16.15616)), .Names = c("age", "gender", "education", "previous_comp_exp", 
"tutorial_time", "qID.1", "time_taken.1", "qID.2", "time_taken.2"
), class = "data.frame", row.names = c(NA, -2L))

答案 1 :(得分:6)

在基础R中,您可以使用强大的reshape在一行语句中将数据从宽格式转换为长格式:

   reshape(dx,direction="long",
        varying=list(grep("qID",colnames(dx)),
                     grep("time_taken",colnames(dx))),
        v.names=c("qID","time_taken"))

     age gender     education previous_comp_exp tutorial_time time  qID time_taken id
1.1  18   Male Undergraduate      casual_gamer      62.17926    1 sor9   39.61206  1
2.1  24   Male Undergraduate      casual_gamer      85.01288    1 sor9   50.92343  2
1.2  18   Male Undergraduate      casual_gamer      62.17926    2 sor8   19.48920  1
2.2  24   Male Undergraduate      casual_gamer      85.01288    2 sor8   16.15616  2