我想通过这种方式操作数据来整理数据集:
age gender education previous_comp_exp tutorial_time qID.1 time_taken.1 qID.2 time_taken.2
18 Male Undergraduate casual gamer 62.17926 sor9 39.61206 sor8 19.4892
24 Male Undergraduate casual gamer 85.01288 sor9 50.92343 sor8 16.15616
成为这个:
age gender education previous_comp_exp tutorial_time qID time_taken
18 Male Undergraduate casual gamer 62.17926 sor9 39.61206
18 Male Undergraduate casual gamer 62.17926 sor8 19.4892
24 Male Undergraduate casual gamer 85.01288 sor9 50.92343
24 Male Undergraduate casual gamer 85.01288 sor8 16.15616
我已尝试使用gather()
,但我只能使用一列来处理此问题,并且我一直收到此警告:
警告消息:测量变量的属性不相同; 他们将被放弃
有什么想法吗?
答案 0 :(得分:11)
来自melt
的{{1}}(请参阅data.table
):
?patterns
<强>结果:强>
library(data.table)
melt(setDT(df), measure = patterns("^qID", "^time_taken"),
value.name = c("qID", "time_taken"))
或 age gender education previous_comp_exp tutorial_time variable qID time_taken
1: 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206
2: 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343
3: 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920
4: 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616
:
tidyr
<强>结果:强>
library(dplyr)
library(tidyr)
df %>%
gather(variable, value, qID.1:time_taken.2) %>%
mutate(variable = sub("\\.\\d$", "", variable)) %>%
group_by(variable) %>%
mutate(ID = row_number()) %>%
spread(variable, value, convert = TRUE) %>%
select(-ID)
注意:强>
对于# A tibble: 4 x 7
age gender education previous_comp_exp tutorial_time qID time_taken
<int> <fctr> <fctr> <fctr> <dbl> <chr> <dbl>
1 18 Male Undergraduate casual_gamer 62.17926 sor9 39.61206
2 18 Male Undergraduate casual_gamer 62.17926 sor8 19.48920
3 24 Male Undergraduate casual_gamer 85.01288 sor9 50.92343
4 24 Male Undergraduate casual_gamer 85.01288 sor8 16.15616
方法,tidyr
用于将convert=TRUE
转换回time_taken
,因为numeric
与gather
时强制转换为qID
{1}}列。
数据:强>
df = structure(list(age = c(18L, 24L), gender = structure(c(1L, 1L
), .Label = "Male", class = "factor"), education = structure(c(1L,
1L), .Label = "Undergraduate", class = "factor"), previous_comp_exp = structure(c(1L,
1L), .Label = "casual_gamer", class = "factor"), tutorial_time = c(62.17926,
85.01288), qID.1 = structure(c(1L, 1L), .Label = "sor9", class = "factor"),
time_taken.1 = c(39.61206, 50.92343), qID.2 = structure(c(1L,
1L), .Label = "sor8", class = "factor"), time_taken.2 = c(19.4892,
16.15616)), .Names = c("age", "gender", "education", "previous_comp_exp",
"tutorial_time", "qID.1", "time_taken.1", "qID.2", "time_taken.2"
), class = "data.frame", row.names = c(NA, -2L))
答案 1 :(得分:6)
在基础R中,您可以使用强大的reshape
在一行语句中将数据从宽格式转换为长格式:
reshape(dx,direction="long",
varying=list(grep("qID",colnames(dx)),
grep("time_taken",colnames(dx))),
v.names=c("qID","time_taken"))
age gender education previous_comp_exp tutorial_time time qID time_taken id
1.1 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206 1
2.1 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343 2
1.2 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920 1
2.2 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616 2