Question

我有一个大型的编码器数据框，对感兴趣的结构进行了评级。最终，我想在每个编码器对上运行kappa可靠性（然后采用加权均值）。首先，我需要找到一种方法来获取下面的单个数据帧（test_data），并为编码器的每种组合（pair1，pair2，pair3等）创建多个数据帧，最终我将通过一个较大的函数来评估kappa的可靠性。

test_data <- data.frame(turn = c("s1: text string", "s2: text string" , "s1: text string", "s2: text string", "s1: text string", "s2: text string", "s1: text string"),
                        id = c(12, 12, 12, 15, 15, 17, 17),
                        coder1_1 = c("high", "low", "med", "high", "high", "high", "low"),
                        coder2_1 = c("high", "low", "med", "high", "med", "high", "low"),
                        coder3_1 = c("med", "med", "med", "high", "low", "high", "med"),
                        coder4_1 = c("high", "low", "med", "high", "med", "high", "low")
)

我想为每个编码器对创建6个单独的数据帧，同时在每个数据帧中保留前两列（turn和id）。

例如，数据帧“ pair1”为：

             turn id coder1_1 coder2_1 
1 s1: text string 12     high     high
2 s2: text string 12      low      low
3 s1: text string 12      med      med
4 s2: text string 15     high     high
5 s1: text string 15     high      med
6 s2: text string 17     high     high
7 s1: text string 17      low      low

下一个数据帧将为“ pair2”：

             turn id coder1_1 coder3_1 
1 s1: text string 12     high      med
2 s2: text string 12      low      med
3 s1: text string 12      med      med
4 s2: text string 15     high     high
5 s1: text string 15     high      low
6 s2: text string 17     high     high
7 s1: text string 17      low      med

在四个编码器（总共6个）的所有二进位比较中都使用

etc...。

由于combn(names(test_data[,c(3:6)]),2,simplify=FALSE)仅创建了列名列表，并且没有保留turn和id，因此我在printf "Write something"; printf "This is in the same line"; printf "\nThis is a new line";上的成功有限。

任何帮助将不胜感激。

Answer 1

我们可以在数据本身上使用combn，然后在cbind的前两列中使用FUN

combn(test_data[3:6], 2, simplify = FALSE, 
        FUN = function(x) cbind(test_data[1:2], x))
#[[1]]
#             turn id coder1_1 coder2_1
#1 s1: text string 12     high     high
#2 s2: text string 12      low      low
#3 s1: text string 12      med      med
#4 s2: text string 15     high     high
#5 s1: text string 15     high      med
#6 s2: text string 17     high     high
#7 s1: text string 17      low      low

#[[2]]
#             turn id coder1_1 coder3_1
#1 s1: text string 12     high      med
#2 s2: text string 12      low      med
#3 s1: text string 12      med      med
#4 s2: text string 15     high     high
#5 s1: text string 15     high      low
#6 s2: text string 17     high     high
#7 s1: text string 17      low      med

#[[3]]
#             turn id coder1_1 coder4_1
#1 s1: text string 12     high     high
#2 s2: text string 12      low      low
#3 s1: text string 12      med      med
#4 s2: text string 15     high     high
#5 s1: text string 15     high      med
#6 s2: text string 17     high     high
#7 s1: text string 17      low      low

#...

Answer 2

替代方法：

pairs <- combn(grep("coder", colnames(test_data), value = TRUE), 2, simplify = FALSE)
str(pairs)
# List of 6
#  $ : chr [1:2] "coder1_1" "coder2_1"
#  $ : chr [1:2] "coder1_1" "coder3_1"
#  $ : chr [1:2] "coder1_1" "coder4_1"
#  $ : chr [1:2] "coder2_1" "coder3_1"
#  $ : chr [1:2] "coder2_1" "coder4_1"
#  $ : chr [1:2] "coder3_1" "coder4_1"

lapply(pairs, function(p) test_data[,c("turn", "id", p)])
# [[1]]
#              turn id coder1_1 coder2_1
# 1 s1: text string 12     high     high
# 2 s2: text string 12      low      low
# 3 s1: text string 12      med      med
# 4 s2: text string 15     high     high
# 5 s1: text string 15     high      med
# 6 s2: text string 17     high     high
# 7 s1: text string 17      low      low
# [[2]]
#              turn id coder1_1 coder3_1
# 1 s1: text string 12     high      med
# 2 s2: text string 12      low      med
### ...

或者在@akrun的答案的combn技巧上使用list，

lapply(combn(test_data[,3:6], 2, simplify = FALSE),
       cbind, test_data[,1:2])

根据选择列的组合创建多个数据框

2 个答案: