将dichotome列连接到以分号分隔的列

时间:2017-04-04 11:50:06

标签: r string data.table

我的数据框包含多项选择题的结果。每个项目都有0(未提及)或1(提及)。列的名称如下:

F1.2_1, F1.2_2, F1.2_3, F1.2_4, F1.2_5, F1.2_99  等

我想将这些值连接起来:新列应该是以分号分隔的所选项的字符串。因此,如果F1.2_1,F1.2_4和F1.2_5中的行为1,则应为:1;4;5

dichotome列的最后一位是要在字符串中使用的商品代码。

知道如何使用R(和data.table)实现这一目标吗?谢谢你的帮助!

修改

以下是具有所需结果的示例DF:

structure(list(F1.2_1 = c(0L, 1L, 0L, 1L), F1.2_2 = c(1L, 0L, 
0L, 1L), F1.2_3 = c(0L, 1L, 0L, 1L), F1.2_4 = c(0L, 1L, 0L, 0L
), F1.2_5 = c(0L, 0L, 0L, 0L), F1.2_99 = c(0L, 0L, 1L, 0L), desired_result = structure(c(3L, 
2L, 4L, 1L), .Label = c("1;2;3", "1;3;4", "2", "99"), class = "factor")), .Names = c("F1.2_1", 
"F1.2_2", "F1.2_3", "F1.2_4", "F1.2_5", "F1.2_99", "desired_result"
), class = "data.frame", row.names = c(NA, -4L))




  F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 desired_result
1      0      1      0      0      0       0              2
2      1      0      1      1      0       0          1;3;4
3      0      0      0      0      0       1             99
4      1      1      1      0      0       0          1;2;3

2 个答案:

答案 0 :(得分:1)

我们可以尝试

 j1 <- do.call(paste, c(as.integer(sub(".*_", "", 
              names(DF)[-7]))[col(DF[-7])]*DF[-7], sep=";"))

 DF$newCol <- gsub("^;+|;+$", "", gsub(";*0;|0$|^0", ";", j1))
 DF$newCol
 #[1] "2"     "1;3;4" "99"    "1;2;3"

答案 1 :(得分:1)

在他的comment中,OP询问如何处理更多选择题。

以下方法将能够处理每个问题的任意数量的问题和选择。它使用melt()包中的dcast()data.table

示例输入数据

让我们假设扩展案例的输入data.frame DT包含两个问题,一个有6个选项,另一个有4个选择:

DT
#   F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 F2.7_1 F2.7_2 F2.7_3 F2.7_11
#1:      0      1      0      0      0       0      0      1      1       0
#2:      1      0      1      1      0       0      1      1      1       1
#3:      0      0      0      0      0       1      1      0      1       0
#4:      1      1      1      0      0       0      1      0      1       1

代码

library(data.table)

# coerce to data.table and add row number for later join
setDT(DT)[, rn := .I]

# reshape from wide to long format
molten <- melt(DT, id.vars = "rn")

# alternatively, the measure cols can be specified (in case of other id vars)
# molten <- melt(DT, measure.vars = patterns("^F"))

# split question id and choice id
molten[, c("question_id", "choice_id") := tstrsplit(variable, "_")]

# reshape only selected choices from long to wide format,
# thereby pasting together the ids of the selected choices for each question
result <- dcast(molten[value == 1], rn ~ question_id, paste, collapse = ";", 
                fill = NA, value.var = "choice_id")

# final join for demonstration only, remove row number as no longer needed
DT[result, on = "rn"][, rn := NULL][]
#   F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 F2.7_1 F2.7_2 F2.7_3 F2.7_11  F1.2     F2.7
#1:      0      1      0      0      0       0      0      1      1       0     2      2;3
#2:      1      0      1      1      0       0      1      1      1       1 1;3;4 1;2;3;11
#3:      0      0      0      0      0       1      1      0      1       0    99      1;3
#4:      1      1      1      0      0       0      1      0      1       1 1;2;3   1;3;11

对于每个问题,最终结果显示在每一行中选择了哪些选项。

可重复数据

可以使用

创建样本数据
DT <- structure(list(F1.2_1 = c(0L, 1L, 0L, 1L), F1.2_2 = c(1L, 0L, 
0L, 1L), F1.2_3 = c(0L, 1L, 0L, 1L), F1.2_4 = c(0L, 1L, 0L, 0L
), F1.2_5 = c(0L, 0L, 0L, 0L), F1.2_99 = c(0L, 0L, 1L, 0L), F2.7_1 = c(0L, 
1L, 1L, 1L), F2.7_2 = c(1L, 1L, 0L, 0L), F2.7_3 = c(1L, 1L, 1L, 
1L), F2.7_11 = c(0L, 1L, 0L, 1L)), .Names = c("F1.2_1", "F1.2_2", 
"F1.2_3", "F1.2_4", "F1.2_5", "F1.2_99", "F2.7_1", "F2.7_2", 
"F2.7_3", "F2.7_11"), row.names = c(NA, -4L), class = "data.frame")