如何使用存储在R中两个其他数据框中的列顺序从数据框中进行选择?

时间:2019-09-30 09:25:52

标签: r

我有三个行数相等的数据框。该代码段是

df1 <- data.frame(Id = c(12345, 12346, 12347), X1 = c('3', '2', '1'), X2 = c('1,2', '1,3', '1'))
df1
     Id X1  X2
1 12345  3 1,2
2 12346  2 1,3
3 12347  1   1

df2 <- data.frame(Id = c(12345, 12346, 12347), X1_1 = c(3, 2, 1), X1_2 = c(1, 1, 2), X1_3 = c(2, 3, 3), X2_1 = c(1, 1, 1), X2_2 = c(2, 3, 3), X2_3 = c(3, 2, 2))
df2
     Id X1_1 X1_2 X1_3 X2_1 X2_2 X2_3
1 12345    3    1    2    1    2    3
2 12346    2    1    3    1    3    2
3 12347    1    2    3    1    3    2

df3 <- data.frame(Id = c(12345, 12346, 12347), X1 = c(1, 2, 1), X2 = c(2, 1, 2))
df3
     Id X1 X2
1 12345  1  2
2 12346  2  1
3 12347  1  2

df1存储df2的列号,我需要从中获取元素。 df1$X1df2$X1_的子集X1_...中第df2列。 df1$X2df2$X2_的子集X2_...中的第df2列,依此类推。以示例中的第一行为例:df$X1 = 3,所以我需要从df2$X1_3(3d列)中获取元素。这个元素是2。然后是df1$X2 = 1,2,我需要两个元素,第一个来自df2$X2_1,第二个来自df2$X2_2。它们是1和2。所有需要获取的第一行元素我都需要作为单个向量存储在所需列表的第一元素中,依此类推。每一行都如此。

就像question with two data frames,但现在df3存储元素的最终顺序,所以我需要获取列表

[[1]]
[1] 2 1 2

[[2]]
[1] 1 2 1

[[3]]
[1] 1 1

在R中创建此元素列表的优雅方法是什么?

UPD::所有三个数据帧均具有NA。将NA替换为零后,实际数据帧的第一行可能看起来像

1)

    df1[1, ]
         Id X1 X5 X6 X7 X8 X13 X14 X16 X19 X2        X3 X4 X9 X11 X12 X15 X18
1 123450744  1  5  1  3  2   0   0   0   3  6 1,2,4,6,7  1  0   0   0   0   5

    df2[1, ]
         Id X1_1 X1_2 X2_1 X2_2 X2_3 X2_4 X2_5 X2_6 X2_7 X2_8 X2_9 X3_1 X3_2 X3_3 X3_4 X3_5 X3_6 X3_7 X3_8 X3_9 X3_10 X3_11 X3_12 X4_1 X4_2 X4_3 X4_4 X4_5 X4_6 X4_7
1 123450744    1    2    4    2    5    7    3    6    1    8    9    1    7   10    6   11    5    8    9    4     3     2    12   11    1    6    4    2    5    8
  X4_8 X4_9 X4_10 X4_11 X4_12 X5_1 X5_2 X5_3 X5_4 X5_5 X5_6 X6_1 X6_2 X6_3 X6_4 X6_5 X6_6 X6_7 X7_1 X7_2 X7_3 X8_1 X8_2 X9_1 X9_2 X9_3 X10_1 X10_2 X10_3 X10_4 X10_5
1   10    7     3     9    12    4    2    5    1    3    6    1    2    3    4    5    6    7    1    2    3    1    2    0    0    0     0     0     0     0     0
  X10_6 X10_7 X10_8 X11_1 X11_2 X11_3 X11_4 X11_5 X11_6 X11_7 X11_8 X11_9 X11_10 X11_11 X11_12 X11_13 X12_1 X12_2 X12_3 X12_4 X12_5 X12_6 X12_7 X12_8 X12_9 X12_10
1     0     0     0     0     0     0     0     0     0     0     0     0      0      0      0      0     0     0     0     0     0     0     0     0     0      0
  X12_11 X12_12 X13_1 X13_2 X13_3 X13_4 X13_5 X14_1 X14_2 X14_3 X14_4 X14_5 X15_1 X15_2 X15_3 X15_4 X15_5 X15_6 X15_7 X15_8 X15_9 X15_10 X15_11 X15_12 X16_1 X16_2
1      0      0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0      0      0      0     0     0
  X18_1 X18_2 X18_3 X18_4 X18_5 X18_6 X18_7 X19_1 X19_2 X19_3 X19_4 X19_5
1     3     5     4     2     6     1     7     1     2     3     4     5

    df3[1, ]
         Id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
1 123450744  1  2  3  4  5  6  7  8  0   0   0   0   0   0   0   0   0   9  10

2)

df1  <- read.table(header=TRUE, as.is = TRUE, text="Id X1 X5 X6 X7 X8 X13 X14 X16 X19 X2        X3 X4 X9 X11 X12 X15 X18
 1 123450744  1  1  4  3  1   2   1   2   4 1,2,4,5,6,7,8  2 1,2,3,6,7 1,2,3   7 1,3,4 4,5,7,8,9   1")
 df2  <- read.table(header=TRUE, as.is = TRUE, text="Id X1_1 X1_2 X2_1 X2_2 X2_3 X2_4 X2_5 X2_6 X2_7 X2_8 X2_9 X3_1 X3_2 X3_3 X3_4 X3_5 X3_6 X3_7 X3_8 X3_9 X3_10 X3_11 X3_12 X4_1 X4_2 X4_3 X4_4 X4_5 X4_6 X4_7 X4_8 X4_9 X4_10 X4_11 X4_12 X5_1 X5_2 X5_3 X5_4 X5_5 X5_6 X6_1 X6_2 X6_3 X6_4 X6_5 X6_6 X6_7 X7_1 X7_2 X7_3 X8_1 X8_2 X9_1 X9_2 X9_3 X10_1 X10_2 X10_3 X10_4 X10_5 X10_6 X10_7 X10_8 X11_1 X11_2 X11_3 X11_4 X11_5 X11_6 X11_7 X11_8 X11_9 X11_10 X11_11 X11_12 X11_13 X12_1 X12_2 X12_3 X12_4 X12_5 X12_6 X12_7 X12_8 X12_9 X12_10 X12_11 X12_12 X13_1 X13_2 X13_3 X13_4 X13_5 X14_1 X14_2 X14_3 X14_4 X14_5 X15_1 X15_2 X15_3 X15_4 X15_5 X15_6 X15_7 X15_8 X15_9 X15_10 X15_11 X15_12 X16_1 X16_2 X18_1 X18_2 X18_3 X18_4 X18_5 X18_6 X18_7 X19_1 X19_2 X19_3 X19_4 X19_5
 1 123450744  1    2    2    7    8    3    1    5    6    4    9    5    6    8   10    7    3    1    9    2    11     4    12    5    8    1    3    6    9    4    7    2    10    11    12    3    5    4    1    2    6    1    2    3    4    5    6    7    1    2    3    1    2    1    2    3     3     4     2     8     5    1     7     6    10     3     6    12     7     9     8     4     5      1      2     11     13     6     7    10     9     4     3     5     2     1     11    8     12     1     2     3     4     5     1     2     3     4     5     7     4     2     6     3     5     1    10     8      9     11     12     1     2   5     4     1     2     6     3     7     1     2     3     4     5")
 df3  <- read.table(header=TRUE, as.is = TRUE, text="d X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
 1 123450744  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19")
 df1[,-1]  <- as.character(df1[,-1])

1 个答案:

答案 0 :(得分:1)

基于Answer to your previous question,可以这样完成:

df1[,-1] <- lapply(df1[,-1], as.character)
df3[df3==0]  <- NA

lapply(setNames(df1$Id, df1$Id), function(Id) {
  x  <- unlist(df3[df3$Id==Id,-1])
  x <- names(sort(x[!is.na(x)])) #When NA is indicating that this column should not be used
  x  <- x[x %in% colnames(df1)] #This is only needed when df3 has colnames which are not in df1
  x  <- unlist(sapply(seq_along(x), function(j) {paste0(x[j], "_", strsplit(df1[df1$Id==Id, x[j]], ",")[[1]])}))
  df2[df2$Id==Id, x]
})
#For the original Question
#$`12345`
#  X1_3 X2_1 X2_2
#1    2    1    2
#
#$`12346`
#  X2_1 X2_3 X1_2
#2    1    2    1
#
#$`12347`
#  X1_1 X2_1
#3    1    1
#
#For UPD 1)
#$`123450744`
#  X1_1 X2_6 X3_1 X3_2 X3_4 X3_6 X3_7 X4_1 X5_5 X6_1 X7_3 X8_2 X18_5 X19_3
#1    1    6    1    7    6    5    8   11    3    1    3    2     6     3
#
#For UPD 2)
#$`123450744`
#  X1_1 X2_1 X2_2 X2_4 X2_5 X2_6 X2_7 X2_8 X3_2 X4_1 X4_2 X4_3 X4_6 X4_7 X5_1 X6_4 X7_3 X8_1 X9_1 X9_2 X9_3 X11_7 X12_1 X12_3 X12_4 X13_2 X14_1 X15_4 X15_5 X15_7 X15_8 X15_9 X16_2 X18_1 X19_4
#1    1    2    7    3    1    5    6    4    6    5    8    1    9    4    3    4    3    1    1    2    3     8     6    10     9     2     1     6     3     1    10     8     2     5     4

数据:

#Original
df1 <- data.frame(Id = c(12345, 12346, 12347), X1 = c('3', '2', '1'), X2 = c('1,2', '1,3', '1'))
df2 <- data.frame(Id = c(12345, 12346, 12347), X1_1 = c(3, 2, 1), X1_2 = c(1, 1, 2), X1_3 = c(2, 3, 3), X2_1 = c(1, 1, 1), X2_2 = c(2, 3, 3), X2_3 = c(3, 2, 2))
df3 <- data.frame(Id = c(12345, 12346, 12347), X1 = c(1, 2, 1), X2 = c(2, 1, 2))

#UPD 1)
df1  <- read.table(header=TRUE, , as.is = TRUE, text="Id X1 X5 X6 X7 X8 X13 X14 X16 X19 X2        X3 X4 X9 X11 X12 X15 X18
1 123450744  1  5  1  3  2   0   0   0   3  6 1,2,4,6,7  1  0   0   0   0   5")
df2  <- read.table(header=TRUE, , as.is = TRUE, text="Id X1_1 X1_2 X2_1 X2_2 X2_3 X2_4 X2_5 X2_6 X2_7 X2_8 X2_9 X3_1 X3_2 X3_3 X3_4 X3_5 X3_6 X3_7 X3_8 X3_9 X3_10 X3_11 X3_12 X4_1 X4_2 X4_3 X4_4 X4_5 X4_6 X4_7 X4_8 X4_9 X4_10 X4_11 X4_12 X5_1 X5_2 X5_3 X5_4 X5_5 X5_6 X6_1 X6_2 X6_3 X6_4 X6_5 X6_6 X6_7 X7_1 X7_2 X7_3 X8_1 X8_2 X9_1 X9_2 X9_3 X10_1 X10_2 X10_3 X10_4 X10_5 X10_6 X10_7 X10_8 X11_1 X11_2 X11_3 X11_4 X11_5 X11_6 X11_7 X11_8 X11_9 X11_10 X11_11 X11_12 X11_13 X12_1 X12_2 X12_3 X12_4 X12_5 X12_6 X12_7 X12_8 X12_9 X12_10 X12_11 X12_12 X13_1 X13_2 X13_3 X13_4 X13_5 X14_1 X14_2 X14_3 X14_4 X14_5 X15_1 X15_2 X15_3 X15_4 X15_5 X15_6 X15_7 X15_8 X15_9 X15_10 X15_11 X15_12 X16_1 X16_2 X18_1 X18_2 X18_3 X18_4 X18_5 X18_6 X18_7 X19_1 X19_2 X19_3 X19_4 X19_5
1 123450744    1    2    4    2    5    7    3    6    1    8    9    1    7   10    6   11    5    8    9    4     3     2    12   11    1    6    4    2    5    8   10    7     3     9    12    4    2    5    1    3    6    1    2    3    4    5    6    7    1    2    3    1    2    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0      0      0      0      0     0     0     0     0     0     0     0     0     0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0      0      0      0     0     0     3     5     4     2     6     1     7     1     2     3     4     5")
df3  <- read.table(header=TRUE, , as.is = TRUE, text="Id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
1 123450744  1  2  3  4  5  6  7  8  0   0   0   0   0   0   0   0   0   9  10")

#UPD 2)
df1  <- read.table(header=TRUE, as.is = TRUE, text="Id X1 X5 X6 X7 X8 X13 X14 X16 X19 X2        X3 X4 X9 X11 X12 X15 X18
 1 123450744  1  1  4  3  1   2   1   2   4 1,2,4,5,6,7,8  2 1,2,3,6,7 1,2,3   7 1,3,4 4,5,7,8,9   1")
df2  <- read.table(header=TRUE, as.is = TRUE, text="Id X1_1 X1_2 X2_1 X2_2 X2_3 X2_4 X2_5 X2_6 X2_7 X2_8 X2_9 X3_1 X3_2 X3_3 X3_4 X3_5 X3_6 X3_7 X3_8 X3_9 X3_10 X3_11 X3_12 X4_1 X4_2 X4_3 X4_4 X4_5 X4_6 X4_7 X4_8 X4_9 X4_10 X4_11 X4_12 X5_1 X5_2 X5_3 X5_4 X5_5 X5_6 X6_1 X6_2 X6_3 X6_4 X6_5 X6_6 X6_7 X7_1 X7_2 X7_3 X8_1 X8_2 X9_1 X9_2 X9_3 X10_1 X10_2 X10_3 X10_4 X10_5 X10_6 X10_7 X10_8 X11_1 X11_2 X11_3 X11_4 X11_5 X11_6 X11_7 X11_8 X11_9 X11_10 X11_11 X11_12 X11_13 X12_1 X12_2 X12_3 X12_4 X12_5 X12_6 X12_7 X12_8 X12_9 X12_10 X12_11 X12_12 X13_1 X13_2 X13_3 X13_4 X13_5 X14_1 X14_2 X14_3 X14_4 X14_5 X15_1 X15_2 X15_3 X15_4 X15_5 X15_6 X15_7 X15_8 X15_9 X15_10 X15_11 X15_12 X16_1 X16_2 X18_1 X18_2 X18_3 X18_4 X18_5 X18_6 X18_7 X19_1 X19_2 X19_3 X19_4 X19_5
 1 123450744  1    2    2    7    8    3    1    5    6    4    9    5    6    8   10    7    3    1    9    2    11     4    12    5    8    1    3    6    9    4    7    2    10    11    12    3    5    4    1    2    6    1    2    3    4    5    6    7    1    2    3    1    2    1    2    3     3     4     2     8     5    1     7     6    10     3     6    12     7     9     8     4     5      1      2     11     13     6     7    10     9     4     3     5     2     1     11    8     12     1     2     3     4     5     1     2     3     4     5     7     4     2     6     3     5     1    10     8      9     11     12     1     2   5     4     1     2     6     3     7     1     2     3     4     5")
df3  <- read.table(header=TRUE, as.is = TRUE, text="Id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
 1 123450744  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19")