比较和合并具有不同长度的列,在空行

时间:2016-10-17 15:14:45

标签: r merge

我试图在SOF中找到解决方案,但我找不到任何东西......

我有这种数据的两个数据框。

    > df
        1  |UNIMOD:730
        2  |UNIMOD:4
        3  |UNIMOD:214
        4  |UNIMOD:21
        5  |UNIMOD:35
              .
              .
              .
            n+1500

还有另外一个:

> df2
            1  |UNIMOD:730
            2  |UNIMOD:4
            3  |UNIMOD:21
            4  |UNIMOD:35
                  .
                  .
                  .
                n+500

我想要的是这种输出,其中合并列,比较值并添加值,其中值不存在。没有重复的值。

    > df
        1  |UNIMOD:730 | UNIMOD:730
        2  |UNIMOD:4   | UNIMOD:4
        3  |UNIMOD:214 | NA
        4  |UNIMOD:21  | UNIMOD:21
        5  |UNIMOD:35  | UNIMOD:35
              .            .
              .            .
              .            .            
            n+1500       n+1500   

我尝试使用选项merge但是这个函数只是在我的所有数据中合并一列,如果使用它:

left_join(df, df2, c("sequence"="sequence"))

我只是得到了相同的结果。

这里有一个可重现的例子:

df <- data.frame(modifications=c("null", "0-UNIMOD:214", "2-UNIMOD:3","12-UNIMOD:24","1-UNIMOD:44","0-UNIMOD:12", "0-UNIMOD:123", "13-UNIMOD:212"))

df2 <- data.frame(modifications=c("null", "0-UNIMOD:24", "2-UNIMOD:3","12-UNIMOD:24","1-UNIMOD:44","0-UNIMOD:12"))

1 个答案:

答案 0 :(得分:1)

这是你所追求的(仅使用基础R,使用?match)?

# Your data with added two columns
df1 <- cbind.data.frame(modifications=c("null", "0-UNIMOD:214", "2-UNIMOD:3","12-UNIMOD:24","1-UNIMOD:44","0-UNIMOD:12", "0-UNIMOD:123", "13-UNIMOD:212"),
            df1col2 = "something",
            df1col3 = "val1");

df2 <- cbind.data.frame(modifications=c("null", "0-UNIMOD:24", "2-UNIMOD:3","12-UNIMOD:24","1-UNIMOD:44","0-UNIMOD:12"),
            df2col2 = "anotherthing",
            df2col3 = "val2");


# Merge df1 and merge2
df <- cbind.data.frame(df1, df2[match(df1$modifications, df2$modifications), ]);
     modifications   df1col2 df1col3 modifications      df2col2 df2col3
1             null something    val1          null anotherthing    val2
NA    0-UNIMOD:214 something    val1          <NA>         <NA>    <NA>
3       2-UNIMOD:3 something    val1    2-UNIMOD:3 anotherthing    val2
4     12-UNIMOD:24 something    val1  12-UNIMOD:24 anotherthing    val2
5      1-UNIMOD:44 something    val1   1-UNIMOD:44 anotherthing    val2
6      0-UNIMOD:12 something    val1   0-UNIMOD:12 anotherthing    val2
NA.1  0-UNIMOD:123 something    val1          <NA>         <NA>    <NA>
NA.2 13-UNIMOD:212 something    val1          <NA>         <NA>    <NA>

# Or merge and remove the duplicate modifcations column (if necessary)
df <- cbind.data.frame(df1, df2[match(df1$modifications, df2$modifications), -1]);
print(df);
     modifications   df1col2 df1col3      df2col2 df2col3
1             null something    val1 anotherthing    val2
NA    0-UNIMOD:214 something    val1         <NA>    <NA>
3       2-UNIMOD:3 something    val1 anotherthing    val2
4     12-UNIMOD:24 something    val1 anotherthing    val2
5      1-UNIMOD:44 something    val1 anotherthing    val2
6      0-UNIMOD:12 something    val1 anotherthing    val2
NA.1  0-UNIMOD:123 something    val1         <NA>    <NA>
NA.2 13-UNIMOD:212 something    val1         <NA>    <NA>