Question

我的数据有两列，它们的值应该匹配。既然如此，我必须执行以下操作：

加载两列
比较A和B以找到匹配的值
将列A和B的输出堆叠到两个新列中，以使匹配的值并排，不匹配的值位于每列的末尾。

是否有一种更快的方法可以将其应用于任何类型的数据（整数，浮点数或字符）比较？

输入数据表

输出数据表

Answer 1

这是一种方法。我可能把这个复杂化了

#copy the object to another dataframe
df1 <- df
#Get matched indices for both the columns
inds1 <- match(df$A, df$B)
inds2 <- match(df$B, df$A)
#Replace value in B column in the same order as A
df1$B <- df$B[inds1]
#Order by column B
df1 <- df1[order(df1$B), ]
#Replace NA in B with unmatched value.
df1$B[is.na(df1$B)] <- df$B[is.na(inds2)]

df1
#    A  B
#1   1  1
#3   3  3
#4   4  4
#5   5  5
#6   6  6
#7   7  7
#8   8  8
#9   9  9
#10 10 10
#2   2 11

Answer 2

您可以找到在列A和B之间匹配的值，这为您提供了所需输出的上部。然后添加不匹配的内容。对于列A的列NA，是从match到列B的{{1}}的那些，对于列号，该序列没有索引的那些索引：

x  <- data.frame(A=1:10, B=c(1,3:11)) #create your dataset

idx <- match(x$A, x$B)
idxNA  <- is.na(idx)
data.frame(C=c(x$A[!idxNA], x$A[idxNA]), D=c(x$B[idx[!idxNA]], x$B[!seq_along(x$B) %in% idx]))
#    C  D
#1   1  1
#2   3  3
#3   4  4
#4   5  5
#5   6  6
#6   7  7
#7   8  8
#8   9  9
#9  10 10
#10  2 11

如果需要分类输出，则必须对A和B进行分类。

比较两列以匹配值并并排对齐

2 个答案: