我正在比较各组之间的一系列基线和研究结束时的差异。例如,我可能有以下数据集:
> baseline.comp
cluster 1970_pred 2008_pred ratio diff
9 Many Transitions, Middle Income 0.1156 0.0248 4.6613 0.0908
10 Many Transitions, Low Income 0.1779 0.0389 4.5733 0.1390
4 Dictatorships, High Income 0.1403 0.0307 4.5700 0.1096
7 One Transition, Middle Income 0.0801 0.0219 3.6575 0.0582
1 Democracies, High Income 0.0396 0.0116 3.4138 0.0280
5 Dictatorships, Middle Income 0.1252 0.0399 3.1378 0.0853
2 Democracies, Middle Income 0.0811 0.0291 2.7869 0.0520
8 One Transition, Low Income 0.1912 0.0775 2.4671 0.1137
3 Democracies, Low Income 0.1612 0.0698 2.3095 0.0914
6 Dictatorships, Low Income 0.1854 0.0821 2.2582 0.1033
在这个例子中,我想将列pred_1970
与自身进行比较,以便我可以有一个表格告诉我这些集群中基线条件的差异。它将是一个10乘10的表,但只有波纹管对角线单元将具有实际数字,反映了这些组的初始条件的差异。我想知道R
是否已经在功能上实现了一些功能。
谢谢,
Antonio Pedro
答案 0 :(得分:2)
尝试以下方法:
# This part is just to create your data:
baseline.comp <- read.table(text="
cluster 1970_pred 2008_pred ratio diff
9 'Many Transitions, Middle Income' 0.1156 0.0248 4.6613 0.0908
10 'Many Transitions, Low Income' 0.1779 0.0389 4.5733 0.1390
4 'Dictatorships, High Income' 0.1403 0.0307 4.5700 0.1096
7 'One Transition, Middle Income' 0.0801 0.0219 3.6575 0.0582
1 'Democracies, High Income' 0.0396 0.0116 3.4138 0.0280
5 'Dictatorships, Middle Income' 0.1252 0.0399 3.1378 0.0853
2 'Democracies, Middle Income' 0.0811 0.0291 2.7869 0.0520
8 'One Transition, Low Income' 0.1912 0.0775 2.4671 0.1137
3 'Democracies, Low Income' 0.1612 0.0698 2.3095 0.0914
6 'Dictatorships, Low Income' 0.1854 0.0821 2.2582 0.1033")
colnames(baseline.comp) <- c("cluster", "1970_pred", "2008_pred", "ratio", "diff")
# Now, we use outer
diff.1970 <- outer(baseline.comp$`1970_pred`, baseline.comp$`1970_pred`, "-")
# Just renaming the output matrix. I've used A through J to make
# the output more readable.
#colnames(diff.1970) <- baseline.comp$cluster
colnames(diff.1970) <- LETTERS[1:10]
#rownames(diff.1970) <- baseline.comp$cluster
rownames(diff.1970) <- LETTERS[1:10]
# Make sure only the lower half of the result contains non-zero values
> diff.1970 * lower.tri(diff.1970)
A B C D E F G H I J
A 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0
B 0.0623 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0
C 0.0247 -0.0376 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0
D -0.0355 -0.0978 -0.0602 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0
E -0.0760 -0.1383 -0.1007 -0.0405 0.0000 0.0000 0.0000 0.0000 0.0000 0
F 0.0096 -0.0527 -0.0151 0.0451 0.0856 0.0000 0.0000 0.0000 0.0000 0
G -0.0345 -0.0968 -0.0592 0.0010 0.0415 -0.0441 0.0000 0.0000 0.0000 0
H 0.0756 0.0133 0.0509 0.1111 0.1516 0.0660 0.1101 0.0000 0.0000 0
I 0.0456 -0.0167 0.0209 0.0811 0.1216 0.0360 0.0801 -0.0300 0.0000 0
J 0.0698 0.0075 0.0451 0.1053 0.1458 0.0602 0.1043 -0.0058 0.0242 0
关于此的一些注意事项:
一般来说,让数字开头的变量(或列名)不是一个好主意。这就是我们在使用read.table
时必须重命名列的原因:R会自动在数字前加上“X”。请注意,在outer
函数中引用这些列名时,我必须使用ticks。最好完全避免这种情况。
至于outer
功能,我使用了一点点变化。通常的调用看起来像x %o% y
,与outer(x, y, "*")
相同。然而,在这种情况下,我们对差异而不是乘法感兴趣。
最后一步是将它乘以lower.tri
,它返回一个TRUE / FALSE矩阵,其中对角线以下的所有内容都为TRUE,其他一切都为FALSE。如果您使用diag = TRUE
作为参数,则对角线也将为TRUE,但这并不重要,因为对角线将始终为零。由于R将TRUE视为1而FALSE视为零,我们可以将lower.tri
乘以原始矩阵,以便为除了我们感兴趣的值(对角线以下的值)之外的所有内容返回零值。
答案 1 :(得分:1)
outer
正是您要找的。 p>
baseline_diff <- outer(baseline.comp[['1970_pred']],baseline.comp[['1970_pred']], '-')
## if you want to set the dimension names (but they will be very long!)
# dimnames(baseline_diff) <- list(baseline.comp[['cluster']],
# baseline.comp[['cluster']])
baseline_diff
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.0000 -0.0623 -0.0247 0.0355 0.0760 -0.0096 0.0345 -0.0756 -0.0456 -0.0698
[2,] 0.0623 0.0000 0.0376 0.0978 0.1383 0.0527 0.0968 -0.0133 0.0167 -0.0075
[3,] 0.0247 -0.0376 0.0000 0.0602 0.1007 0.0151 0.0592 -0.0509 -0.0209 -0.0451
[4,] -0.0355 -0.0978 -0.0602 0.0000 0.0405 -0.0451 -0.0010 -0.1111 -0.0811 -0.1053
[5,] -0.0760 -0.1383 -0.1007 -0.0405 0.0000 -0.0856 -0.0415 -0.1516 -0.1216 -0.1458
[6,] 0.0096 -0.0527 -0.0151 0.0451 0.0856 0.0000 0.0441 -0.0660 -0.0360 -0.0602
[7,] -0.0345 -0.0968 -0.0592 0.0010 0.0415 -0.0441 0.0000 -0.1101 -0.0801 -0.1043
[8,] 0.0756 0.0133 0.0509 0.1111 0.1516 0.0660 0.1101 0.0000 0.0300 0.0058
[9,] 0.0456 -0.0167 0.0209 0.0811 0.1216 0.0360 0.0801 -0.0300 0.0000 -0.0242
[10,] 0.0698 0.0075 0.0451 0.1053 0.1458 0.0602 0.1043 -0.0058 0.0242 0.0000
要仅在tril
包中使用triu
或Matrix
来显示较低(或较高)的三角形
library(Matrix)
tril(baseline_diff)
10 x 10 Matrix of class "dtrMatrix"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.0000 . . . . . . . . .
[2,] 0.0623 0.0000 . . . . . . . .
[3,] 0.0247 -0.0376 0.0000 . . . . . . .
[4,] -0.0355 -0.0978 -0.0602 0.0000 . . . . . .
[5,] -0.0760 -0.1383 -0.1007 -0.0405 0.0000 . . . . .
[6,] 0.0096 -0.0527 -0.0151 0.0451 0.0856 0.0000 . . . .
[7,] -0.0345 -0.0968 -0.0592 0.0010 0.0415 -0.0441 0.0000 . . .
[8,] 0.0756 0.0133 0.0509 0.1111 0.1516 0.0660 0.1101 0.0000 . .
[9,] 0.0456 -0.0167 0.0209 0.0811 0.1216 0.0360 0.0801 -0.0300 0.0000 .
[10,] 0.0698 0.0075 0.0451 0.1053 0.1458 0.0602 0.1043 -0.0058 0.0242 0.0000