假设我有data.frame
,如果我将多个列放在一起(比如a
,b
和c
),那么我有一个唯一的标识符两个不同的行(在name
列上有所不同,以及一堆值列x
,y
和z
)。
我希望在值列上取得差异,保留关键列,并为名称列指定一个新值diff
。
例如,假设我有以下数据:
a b c x y z name
1 1 M J 0.0 1.0 2.0 alpha
2 1 M K 0.1 0.9 2.0 alpha
3 1 O J 0.2 0.8 2.0 alpha
4 1 O K 0.3 0.7 2.0 alpha
5 2 M J 0.4 0.6 2.0 alpha
6 2 M K 0.5 0.5 2.0 alpha
7 2 O J 0.6 0.4 2.0 alpha
8 2 O K 0.7 0.3 2.0 alpha
9 1 M J 0.0 2.0 1.0 beta
10 1 M K 0.1 1.9 3.0 beta
11 1 O J 0.2 1.8 1.0 beta
12 1 O K 0.3 1.7 3.0 beta
13 2 M J 0.4 1.6 1.0 beta
14 2 M K 0.5 1.5 3.0 beta
15 2 O J 0.6 1.4 1.0 beta
16 2 O K 0.7 1.3 3.0 beta
然后我希望新数据框为:
a b c x y z name
1 1 M J 0.0 1.0 2.0 alpha
2 1 M K 0.1 0.9 2.0 alpha
3 1 O J 0.2 0.8 2.0 alpha
4 1 O K 0.3 0.7 2.0 alpha
5 2 M J 0.4 0.6 2.0 alpha
6 2 M K 0.5 0.5 2.0 alpha
7 2 O J 0.6 0.4 2.0 alpha
8 2 O K 0.7 0.3 2.0 alpha
9 1 M J 0.0 2.0 1.0 beta
10 1 M K 0.1 1.9 3.0 beta
11 1 O J 0.2 1.8 1.0 beta
12 1 O K 0.3 1.7 3.0 beta
13 2 M J 0.4 1.6 1.0 beta
14 2 M K 0.5 1.5 3.0 beta
15 2 O J 0.6 1.4 1.0 beta
16 2 O K 0.7 1.3 3.0 beta
17 1 M J 0.0 -1.0 1.0 diff
18 1 M K 0.0 -1.0 -1.0 diff
19 1 O J 0.0 -1.0 1.0 diff
20 1 O K 0.0 -1.0 -1.0 diff
21 2 M J 0.0 -1.0 1.0 diff
22 2 M K 0.0 -1.0 -1.0 diff
23 2 O J 0.0 -1.0 1.0 diff
24 2 O K 0.0 -1.0 -1.0 diff
最简单的方法是什么?
答案 0 :(得分:2)
您可以单独制作每列:
colx = ave(df$x, paste(df$a, df$b, df$c), FUN=function(x) x[1]-x[2])
coly = ave(df$y, paste(df$a, df$b, df$c), FUN=function(x) x[1]-x[2])
colz = ave(df$z, paste(df$a, df$b, df$c), FUN=function(x) x[1]-x[2])
然后把它们放在一起:
df2 = subset(df, name=="alpha")
df2$name = "diff"
df2$x = colx[1:(length(colx)/2)]
df2$y = coly[1:(length(coly)/2)]
df2$z = colz[1:(length(colz)/2)]
现在加入原创
df = rbind(df, df2)
这给出了:
a b c x y z name
1 1 m j 0.0 1.0 2 a
2 1 m k 0.1 0.9 2 a
3 1 o j 0.2 0.8 2 a
4 1 o k 0.3 0.7 2 a
5 2 m j 0.4 0.6 2 a
6 2 m k 0.5 0.5 2 a
7 2 o j 0.6 0.4 2 a
8 2 o k 0.7 0.3 2 a
9 1 m j 0.0 2.0 1 b
10 1 m k 0.1 1.9 3 b
11 1 o j 0.2 1.8 1 b
12 1 o k 0.3 1.7 3 b
13 2 m j 0.4 1.6 1 b
14 2 m k 0.5 1.5 3 b
15 2 o j 0.6 1.4 1 b
16 2 o k 0.7 1.3 3 b
17 1 m j 0.0 -1.0 1 diff
18 1 m k 0.0 -1.0 -1 diff
19 1 o j 0.0 -1.0 1 diff
20 1 o k 0.0 -1.0 -1 diff
21 2 m j 0.0 -1.0 1 diff
22 2 m k 0.0 -1.0 -1 diff
23 2 o j 0.0 -1.0 1 diff
24 2 o k 0.0 -1.0 -1 diff
答案 1 :(得分:1)
如果您的矩阵始终排序和平衡。然后这应该工作
half<-1:(nrow(df)/2)
rbind(
df,
cbind(
df[half, 1:3],
df[half, 4:6] - df[half+half[length(half)], 4:6],
name="diff"
)
)