Question

我试图在根据变量对具有重复ID的长表进行子集化后合并两个xdf文件。

假设我有两列：id和type

我基于说System.out.print("\nBubble Sort: "); for(int i = 0; i < list.length; i++){ System.out.print(list[i] + " "); }对原始xdf表进行子集化，并获得第一个xdf文件我根据说type = 'type1'对原始xdf表进行子集化，并得到第二个xdf文件

第一个xdf文件看起来像（有很多不同的ID，但我在下面的例子中显示了一个ID）

type = 'type2'

第二个xdf文件看起来像（有很多不同的ID，但我在下面的例子中显示了一个ID）

id type1
__ ____
1    5

然后，我将两个xdf文件合并到另一个xdf文件

id type2
__ ____
1    3

我在

中得到id = 1的两条记录

rxMerge(file1, file2, outFile = final, autoSort = FALSE, matchVars = 'id', type = 'full', overwrite = TRUE)

我在期待

id type1 type2
__ ____ ______
1    5    NA

1    NA    3

我做错了什么？

Answer 1

嗯......你的例子对我有用，在RRE 7.4.1：

# Example data
x <- data.frame(id = 1, type1 = 5)
y <- data.frame(id = 1, type2 = 3)

# Creating XDFs for the example data
file1 <- tempfile(fileext = ".xdf")
rxImport(inData = x, outFile = file1)

file2 <- tempfile(fileext = ".xdf")
rxImport(inData = y, outFile = file2)

# Merging into a third XDF
final <- tempfile(fileext = ".xdf")

rxMerge(inData1 = file1, 
        inData2 = file2, 
        outFile = final, 
        autoSort = FALSE, 
        matchVars = 'id',
        type = 'full',
        overwrite = TRUE)

# Check the output
rxDataStep(final)

因此很难知道可能会发生什么。设置autoSort = TRUE后会发生什么？你在运行什么版本的RRE？（您可以通过加载RevoScaleR并运行sessionInfo()）来获取版本号

Revolution / rxMerge和行的重复

1 个答案: