如何在r中创建用于组比较的表?

时间:2012-08-16 03:25:49

标签: r

我正在比较各组之间的一系列基线和研究结束时的差异。例如,我可能有以下数据集:

> baseline.comp
                             cluster 1970_pred 2008_pred  ratio   diff
 9  Many Transitions, Middle Income    0.1156    0.0248 4.6613 0.0908
10     Many Transitions, Low Income    0.1779    0.0389 4.5733 0.1390
 4       Dictatorships, High Income    0.1403    0.0307 4.5700 0.1096
 7    One Transition, Middle Income    0.0801    0.0219 3.6575 0.0582
 1         Democracies, High Income    0.0396    0.0116 3.4138 0.0280
 5     Dictatorships, Middle Income    0.1252    0.0399 3.1378 0.0853
 2       Democracies, Middle Income    0.0811    0.0291 2.7869 0.0520
 8       One Transition, Low Income    0.1912    0.0775 2.4671 0.1137
 3          Democracies, Low Income    0.1612    0.0698 2.3095 0.0914
 6        Dictatorships, Low Income    0.1854    0.0821 2.2582 0.1033

在这个例子中,我想将列pred_1970与自身进行比较,以便我可以有一个表格告诉我这些集群中基线条件的差异。它将是一个10乘10的表,但只有波纹管对角线单元将具有实际数字,反映了这些组的初始条件的差异。我想知道R是否已经在功能上实现了一些功能。

谢谢,

Antonio Pedro

2 个答案:

答案 0 :(得分:2)

尝试以下方法:

# This part is just to create your data:

baseline.comp <- read.table(text="
                             cluster 1970_pred 2008_pred  ratio   diff
 9  'Many Transitions, Middle Income'    0.1156    0.0248 4.6613 0.0908
10     'Many Transitions, Low Income'    0.1779    0.0389 4.5733 0.1390
 4       'Dictatorships, High Income'    0.1403    0.0307 4.5700 0.1096
 7    'One Transition, Middle Income'    0.0801    0.0219 3.6575 0.0582
 1         'Democracies, High Income'    0.0396    0.0116 3.4138 0.0280
 5     'Dictatorships, Middle Income'    0.1252    0.0399 3.1378 0.0853
 2      'Democracies, Middle Income'    0.0811    0.0291 2.7869 0.0520
 8       'One Transition, Low Income'    0.1912    0.0775 2.4671 0.1137
 3          'Democracies, Low Income'    0.1612    0.0698 2.3095 0.0914
 6        'Dictatorships, Low Income'   0.1854    0.0821 2.2582 0.1033")

colnames(baseline.comp) <- c("cluster", "1970_pred", "2008_pred", "ratio", "diff")

# Now, we use outer

diff.1970 <- outer(baseline.comp$`1970_pred`, baseline.comp$`1970_pred`, "-")

# Just renaming the output matrix. I've used A through J to make 
# the output more readable.

#colnames(diff.1970) <- baseline.comp$cluster
colnames(diff.1970) <- LETTERS[1:10]
#rownames(diff.1970) <- baseline.comp$cluster
rownames(diff.1970) <- LETTERS[1:10]

# Make sure only the lower half of the result contains non-zero values

> diff.1970 * lower.tri(diff.1970)
        A       B       C       D      E       F      G       H      I J
A  0.0000  0.0000  0.0000  0.0000 0.0000  0.0000 0.0000  0.0000 0.0000 0
B  0.0623  0.0000  0.0000  0.0000 0.0000  0.0000 0.0000  0.0000 0.0000 0
C  0.0247 -0.0376  0.0000  0.0000 0.0000  0.0000 0.0000  0.0000 0.0000 0
D -0.0355 -0.0978 -0.0602  0.0000 0.0000  0.0000 0.0000  0.0000 0.0000 0
E -0.0760 -0.1383 -0.1007 -0.0405 0.0000  0.0000 0.0000  0.0000 0.0000 0
F  0.0096 -0.0527 -0.0151  0.0451 0.0856  0.0000 0.0000  0.0000 0.0000 0
G -0.0345 -0.0968 -0.0592  0.0010 0.0415 -0.0441 0.0000  0.0000 0.0000 0
H  0.0756  0.0133  0.0509  0.1111 0.1516  0.0660 0.1101  0.0000 0.0000 0
I  0.0456 -0.0167  0.0209  0.0811 0.1216  0.0360 0.0801 -0.0300 0.0000 0
J  0.0698  0.0075  0.0451  0.1053 0.1458  0.0602 0.1043 -0.0058 0.0242 0

关于此的一些注意事项:

一般来说,让数字开头的变量(或列名)不是一个好主意。这就是我们在使用read.table时必须重命名列的原因:R会自动在数字前加上“X”。请注意,在outer函数中引用这些列名时,我必须使用ticks。最好完全避免这种情况。

至于outer功能,我使用了一点点变化。通常的调用看起来像x %o% y,与outer(x, y, "*")相同。然而,在这种情况下,我们对差异而不是乘法感兴趣。

最后一步是将它乘以lower.tri,它返回一个TRUE / FALSE矩阵,其中对角线以下的所有内容都为TRUE,其他一切都为FALSE。如果您使用diag = TRUE作为参数,则对角线也将为TRUE,但这并不重要,因为对角线将始终为零。由于R将TRUE视为1而FALSE视为零,我们可以将lower.tri乘以原始矩阵,以便为除了我们感兴趣的值(对角线以下的值)之外的所有内容返回零值。

答案 1 :(得分:1)

outer正是您要找的。

baseline_diff <- outer(baseline.comp[['1970_pred']],baseline.comp[['1970_pred']], '-')
## if you want to set the dimension names (but they will be very long!)
# dimnames(baseline_diff) <- list(baseline.comp[['cluster']],
#                                  baseline.comp[['cluster']])
 baseline_diff
          [,1]    [,2]    [,3]    [,4]   [,5]    [,6]    [,7]    [,8]    [,9]   [,10]
 [1,]  0.0000 -0.0623 -0.0247  0.0355 0.0760 -0.0096  0.0345 -0.0756 -0.0456 -0.0698
 [2,]  0.0623  0.0000  0.0376  0.0978 0.1383  0.0527  0.0968 -0.0133  0.0167 -0.0075
 [3,]  0.0247 -0.0376  0.0000  0.0602 0.1007  0.0151  0.0592 -0.0509 -0.0209 -0.0451
 [4,] -0.0355 -0.0978 -0.0602  0.0000 0.0405 -0.0451 -0.0010 -0.1111 -0.0811 -0.1053
 [5,] -0.0760 -0.1383 -0.1007 -0.0405 0.0000 -0.0856 -0.0415 -0.1516 -0.1216 -0.1458
 [6,]  0.0096 -0.0527 -0.0151  0.0451 0.0856  0.0000  0.0441 -0.0660 -0.0360 -0.0602
 [7,] -0.0345 -0.0968 -0.0592  0.0010 0.0415 -0.0441  0.0000 -0.1101 -0.0801 -0.1043
 [8,]  0.0756  0.0133  0.0509  0.1111 0.1516  0.0660  0.1101  0.0000  0.0300  0.0058
 [9,]  0.0456 -0.0167  0.0209  0.0811 0.1216  0.0360  0.0801 -0.0300  0.0000 -0.0242
[10,]  0.0698  0.0075  0.0451  0.1053 0.1458  0.0602  0.1043 -0.0058  0.0242  0.0000

要仅在tril包中使用triuMatrix来显示较低(或较高)的三角形

library(Matrix)

tril(baseline_diff)

10 x 10 Matrix of class "dtrMatrix"
      [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]    [,9]    [,10]  
 [1,]  0.0000       .       .       .       .       .       .       .       .       .
 [2,]  0.0623  0.0000       .       .       .       .       .       .       .       .
 [3,]  0.0247 -0.0376  0.0000       .       .       .       .       .       .       .
 [4,] -0.0355 -0.0978 -0.0602  0.0000       .       .       .       .       .       .
 [5,] -0.0760 -0.1383 -0.1007 -0.0405  0.0000       .       .       .       .       .
 [6,]  0.0096 -0.0527 -0.0151  0.0451  0.0856  0.0000       .       .       .       .
 [7,] -0.0345 -0.0968 -0.0592  0.0010  0.0415 -0.0441  0.0000       .       .       .
 [8,]  0.0756  0.0133  0.0509  0.1111  0.1516  0.0660  0.1101  0.0000       .       .
 [9,]  0.0456 -0.0167  0.0209  0.0811  0.1216  0.0360  0.0801 -0.0300  0.0000       .
[10,]  0.0698  0.0075  0.0451  0.1053  0.1458  0.0602  0.1043 -0.0058  0.0242  0.0000