使用多列键连接差异行

时间:2014-05-23 02:00:51

标签: r dataframe

假设我有data.frame,如果我将多个列放在一起(比如abc),那么我有一个唯一的标识符两个不同的行(在name列上有所不同,以及一堆值列xyz)。

我希望在值列上取得差异,保留关键列,并为名称列指定一个新值diff

例如,假设我有以下数据:

    a b c    x    y    z  name
 1  1 M J  0.0  1.0  2.0 alpha
 2  1 M K  0.1  0.9  2.0 alpha
 3  1 O J  0.2  0.8  2.0 alpha
 4  1 O K  0.3  0.7  2.0 alpha
 5  2 M J  0.4  0.6  2.0 alpha
 6  2 M K  0.5  0.5  2.0 alpha
 7  2 O J  0.6  0.4  2.0 alpha
 8  2 O K  0.7  0.3  2.0 alpha
 9  1 M J  0.0  2.0  1.0  beta
10  1 M K  0.1  1.9  3.0  beta
11  1 O J  0.2  1.8  1.0  beta
12  1 O K  0.3  1.7  3.0  beta
13  2 M J  0.4  1.6  1.0  beta
14  2 M K  0.5  1.5  3.0  beta
15  2 O J  0.6  1.4  1.0  beta
16  2 O K  0.7  1.3  3.0  beta

然后我希望新数据框为:

    a b c    x    y    z  name
 1  1 M J  0.0  1.0  2.0 alpha
 2  1 M K  0.1  0.9  2.0 alpha
 3  1 O J  0.2  0.8  2.0 alpha
 4  1 O K  0.3  0.7  2.0 alpha
 5  2 M J  0.4  0.6  2.0 alpha
 6  2 M K  0.5  0.5  2.0 alpha
 7  2 O J  0.6  0.4  2.0 alpha
 8  2 O K  0.7  0.3  2.0 alpha
 9  1 M J  0.0  2.0  1.0  beta
10  1 M K  0.1  1.9  3.0  beta
11  1 O J  0.2  1.8  1.0  beta
12  1 O K  0.3  1.7  3.0  beta
13  2 M J  0.4  1.6  1.0  beta
14  2 M K  0.5  1.5  3.0  beta
15  2 O J  0.6  1.4  1.0  beta
16  2 O K  0.7  1.3  3.0  beta
17  1 M J  0.0 -1.0  1.0  diff
18  1 M K  0.0 -1.0 -1.0  diff
19  1 O J  0.0 -1.0  1.0  diff
20  1 O K  0.0 -1.0 -1.0  diff
21  2 M J  0.0 -1.0  1.0  diff
22  2 M K  0.0 -1.0 -1.0  diff
23  2 O J  0.0 -1.0  1.0  diff
24  2 O K  0.0 -1.0 -1.0  diff

最简单的方法是什么?

2 个答案:

答案 0 :(得分:2)

您可以单独制作每列:

colx = ave(df$x, paste(df$a, df$b, df$c), FUN=function(x) x[1]-x[2])
coly = ave(df$y, paste(df$a, df$b, df$c), FUN=function(x) x[1]-x[2])
colz = ave(df$z, paste(df$a, df$b, df$c), FUN=function(x) x[1]-x[2])

然后把它们放在一起:

df2 = subset(df, name=="alpha")
df2$name = "diff"
df2$x = colx[1:(length(colx)/2)]
df2$y = coly[1:(length(coly)/2)]
df2$z = colz[1:(length(colz)/2)]

现在加入原创

df = rbind(df, df2) 

这给出了:

   a b c   x    y  z name
1  1 m j 0.0  1.0  2    a
2  1 m k 0.1  0.9  2    a
3  1 o j 0.2  0.8  2    a
4  1 o k 0.3  0.7  2    a
5  2 m j 0.4  0.6  2    a
6  2 m k 0.5  0.5  2    a
7  2 o j 0.6  0.4  2    a
8  2 o k 0.7  0.3  2    a
9  1 m j 0.0  2.0  1    b
10 1 m k 0.1  1.9  3    b
11 1 o j 0.2  1.8  1    b
12 1 o k 0.3  1.7  3    b
13 2 m j 0.4  1.6  1    b
14 2 m k 0.5  1.5  3    b
15 2 o j 0.6  1.4  1    b
16 2 o k 0.7  1.3  3    b
17 1 m j 0.0 -1.0  1 diff
18 1 m k 0.0 -1.0 -1 diff
19 1 o j 0.0 -1.0  1 diff
20 1 o k 0.0 -1.0 -1 diff
21 2 m j 0.0 -1.0  1 diff
22 2 m k 0.0 -1.0 -1 diff
23 2 o j 0.0 -1.0  1 diff
24 2 o k 0.0 -1.0 -1 diff

答案 1 :(得分:1)

如果您的矩阵始终排序和平衡。然后这应该工作

half<-1:(nrow(df)/2)
rbind(
    df,
    cbind( 
        df[half, 1:3], 
        df[half, 4:6] - df[half+half[length(half)], 4:6], 
        name="diff"
    )
)