我的数据格式为:
structure(list(choice = structure(c(1L, 1L, 2L, 1L), .Label = c("option1",
"option2"), class = "factor"), option1var1 = structure(c(1L,
1L, 1L, 1L), .Label = "A", class = "factor"), option1var2 = structure(c(1L,
1L, 1L, 2L), .Label = c("B", "H"), class = "factor"), option2var1 = structure(c(1L,
1L, 2L, 3L), .Label = c("C", "F", "I"), class = "factor"), option2var2 = structure(1:4, .Label = c("D",
"E", "G", "K"), class = "factor")), .Names = c("choice", "option1var1",
"option1var2", "option2var1", "option2var2"), class = "data.frame", row.names = c(NA,
-4L))
有六列。第一列包含响应者ID,第二列包含有关响应者选择的数据(option1或option2),第3列和第4列包含与option1关联的属性,第4列和第5列包含与option2关联的属性。
我想转换数据框,使其看起来像这样:
structure(list(respondent = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L),
choice = c(1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L), option = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("option1", "option2"
), class = "factor"), var1 = structure(c(1L, 2L, 1L, 2L,
1L, 3L, 1L, 4L), .Label = c("A", "C", "F", "I"), class = "factor"),
var2 = structure(c(1L, 2L, 1L, 3L, 1L, 4L, 5L, 6L), .Label = c("B",
"D", "E", "G", "H", "K"), class = "factor")), .Names = c("respondent",
"choice", "option", "var1", "var2"), class = "data.frame", row.names = c(NA,
-8L))
这需要将每一行拆分为两行,将option1数据保留在一行中,并将option2数据移动到另一行,以及创建一个新的数字变量,其中包含有关哪个选项的信息(每个响应者选择选项1或选项2)。
似乎没有关于此类转换的任何信息 - 无论是在这里还是在我发现的R文档中。有谁知道怎么做?
答案 0 :(得分:3)
假设原始数据框为df1
,最终输出为df2
。
library(tidyverse)
df2 <- df1 %>%
mutate(respondent = 1:n()) %>%
gather(Option, Value, starts_with("option")) %>%
separate(Option, into = c("option", "Var"), sep = 7) %>%
mutate(choice = ifelse(choice == option, 1L, 0L)) %>%
spread(Var, Value) %>%
select(respondent, choice, option, starts_with("var")) %>%
arrange(respondent, option)
df2
# respondent choice option var1 var2
# 1 1 1 option1 A B
# 2 1 0 option2 C D
# 3 2 1 option1 A B
# 4 2 0 option2 C E
# 5 3 0 option1 A B
# 6 3 1 option2 F G
# 7 4 1 option1 A H
# 8 4 0 option2 I K