我正在尝试合并R中的两个数据帧。我想将df2添加到df,以便保持df的长度,即删除任何不匹配的行。我需要属于每个人的最小负整数始终匹配'month1',与month2匹配的第二个最小整数,与第3个月匹配的第三个最小整数,依此类推。
DF
names variable
1 jane -4
2 jane -3
3 jane -2
4 john -5
5 john -4
6 john -3
7 john -2
8 john -1
9 john 1
10 john 2
11 mary -3
12 mary -2
13 mary -1
14 mary 1
15 mary 2
16 mary 3
17 mary 4
18 tom -6
19 tom -5
20 tom -4
21 tom -3
22 tom -2
23 tom -1
24 tom 1
df2
noms months
1 jane month1
2 jane month2
3 jane month3
4 jane month4
5 jane month5
6 jane month6
7 jane month7
8 jane month8
9 jane month9
10 jane month10
11 john month1
12 john month2
13 john month3
14 john month4
15 john month5
16 john month6
17 john month7
18 john month8
19 john month9
20 john month10
21 mary month1
22 mary month2
23 mary month3
24 mary month4
25 mary month5
26 mary month6
27 mary month7
28 mary month8
29 mary month9
30 mary month10
31 tom month1
32 tom month2
33 tom month3
34 tom month4
35 tom month5
36 tom month6
37 tom month7
38 tom month8
39 tom month9
40 tom month10
渴望输出
names variable months
1 jane -4 month1
2 jane -3 month2
3 jane -2 month3
4 john -5 month1
5 john -4 month2
6 john -3 month3
7 john -2 month4
8 john -1 month5
9 john 1 month6
10 john 2 month7
11 mary -3 month1
12 mary -2 month2
13 mary -1 month3
14 mary 1 month4
15 mary 2 month5
16 mary 3 month6
17 mary 4 month7
18 tom -6 month1
19 tom -5 month2
20 tom -4 month3
21 tom -3 month4
22 tom -2 month5
23 tom -1 month6
24 tom 1 month7
这似乎应该是一个简单的合并,但代码不适合我。这就是我试过的
final <- merge(df, df2, by.x = "names", by.y = "noms", sort=F,all.x=T,all.y=F)
也试过这个
x<-df2$names %in% df$noms
y<-cbind(df2, x)
matches<-y[y$x!=FALSE,]
我确信这是一个基本问题,但我的简单合并代码将无法正常工作。预先感谢您的任何帮助。
答案 0 :(得分:4)
这是另一种方法:
tab <- table(df$names) # count rows per name
# create vector with months
tmp <- unlist(lapply(names(tab), function(x) {
head(df2$months[as.character(df2$noms) == x], tab[x])
}))
# create new data frame
final <- cbind(df, months = tmp)
结果:
names variable months
1 jane -4 month1
2 jane -3 month2
3 jane -2 month3
4 john -5 month1
5 john -4 month2
6 john -3 month3
7 john -2 month4
8 john -1 month5
9 john 1 month6
10 john 2 month7
11 mary -3 month1
12 mary -2 month2
13 mary -1 month3
14 mary 1 month4
15 mary 2 month5
16 mary 3 month6
17 mary 4 month7
18 tom -6 month1
19 tom -5 month2
20 tom -4 month3
21 tom -3 month4
22 tom -2 month5
23 tom -1 month6
24 tom 1 month7
答案 1 :(得分:3)
使用数据表的方法:
library(data.table)
dt <- data.table(df)
dt[,months:=paste0("month",row(dt)[,1]),by=names]
head(dt,10)
# names variable months
# 1: jane -4 month1
# 2: jane -3 month2
# 3: jane -2 month3
# 4: john -5 month1
# 5: john -4 month2
# 6: john -3 month3
# 7: john -2 month4
# 8: john -1 month5
# 9: john 1 month6
# 10: john 2 month7