合并两个数据帧R.

时间:2014-01-13 13:01:33

标签: r merge

我正在尝试合并R中的两个数据帧。我想将df2添加到df,以便保持df的长度,即删除任何不匹配的行。我需要属于每个人的最小负整数始终匹配'month1',与month2匹配的第二个最小整数,与第3个月匹配的第三个最小整数,依此类推。

DF

names variable
1   jane       -4
2   jane       -3
3   jane       -2
4   john       -5
5   john       -4
6   john       -3
7   john       -2
8   john       -1
9   john        1
10  john        2
11  mary       -3
12  mary       -2
13  mary       -1
14  mary        1
15  mary        2
16  mary        3
17  mary        4
18   tom       -6
19   tom       -5
20   tom       -4
21   tom       -3
22   tom       -2
23   tom       -1
24   tom        1

df2
      noms  months
    1  jane  month1
    2  jane  month2
    3  jane  month3
    4  jane  month4
    5  jane  month5
    6  jane  month6
    7  jane  month7
    8  jane  month8
    9  jane  month9
    10 jane month10
    11 john  month1
    12 john  month2
    13 john  month3
    14 john  month4
    15 john  month5
    16 john  month6
    17 john  month7
    18 john  month8
    19 john  month9
    20 john month10
    21 mary  month1
    22 mary  month2
    23 mary  month3
    24 mary  month4
    25 mary  month5
    26 mary  month6
    27 mary  month7
    28 mary  month8
    29 mary  month9
    30 mary month10
    31  tom  month1
    32  tom  month2
    33  tom  month3
    34  tom  month4
    35  tom  month5
    36  tom  month6
    37  tom  month7
    38  tom  month8
    39  tom  month9
    40  tom month10

渴望输出

 names variable months
1   jane       -4 month1
2   jane       -3 month2
3   jane       -2 month3
4   john       -5 month1
5   john       -4 month2
6   john       -3 month3
7   john       -2 month4
8   john       -1 month5
9   john        1 month6
10  john        2 month7
11  mary       -3 month1
12  mary       -2 month2
13  mary       -1 month3
14  mary        1 month4
15  mary        2 month5
16  mary        3 month6
17  mary        4 month7
18   tom       -6 month1
19   tom       -5 month2
20   tom       -4 month3
21   tom       -3 month4
22   tom       -2 month5
23   tom       -1 month6
24   tom        1 month7

这似乎应该是一个简单的合并,但代码不适合我。这就是我试过的

final <- merge(df, df2, by.x = "names", by.y = "noms", sort=F,all.x=T,all.y=F)

也试过这个

x<-df2$names %in% df$noms   
y<-cbind(df2, x)                
matches<-y[y$x!=FALSE,]

我确信这是一个基本问题,但我的简单合并代码将无法正常工作。预先感谢您的任何帮助。

2 个答案:

答案 0 :(得分:4)

这是另一种方法:

tab <- table(df$names) # count rows per name

# create vector with months
tmp <- unlist(lapply(names(tab), function(x) {
  head(df2$months[as.character(df2$noms) == x], tab[x])
}))

# create new data frame
final <- cbind(df, months = tmp)

结果:

   names variable months
1   jane       -4 month1
2   jane       -3 month2
3   jane       -2 month3
4   john       -5 month1
5   john       -4 month2
6   john       -3 month3
7   john       -2 month4
8   john       -1 month5
9   john        1 month6
10  john        2 month7
11  mary       -3 month1
12  mary       -2 month2
13  mary       -1 month3
14  mary        1 month4
15  mary        2 month5
16  mary        3 month6
17  mary        4 month7
18   tom       -6 month1
19   tom       -5 month2
20   tom       -4 month3
21   tom       -3 month4
22   tom       -2 month5
23   tom       -1 month6
24   tom        1 month7

答案 1 :(得分:3)

使用数据表的方法:

library(data.table)
dt <- data.table(df)
dt[,months:=paste0("month",row(dt)[,1]),by=names]
head(dt,10)
#     names variable months
#  1:  jane       -4 month1
#  2:  jane       -3 month2
#  3:  jane       -2 month3
#  4:  john       -5 month1
#  5:  john       -4 month2
#  6:  john       -3 month3
#  7:  john       -2 month4
#  8:  john       -1 month5
#  9:  john        1 month6
# 10:  john        2 month7