(统计)2路表规范化

时间:2015-11-03 10:29:06

标签: r matlab statistics

我有一张这样的桌子。

    X  X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015
1  SU 103.27 105.2  99.7 106.7  96.7 108.4  88.7 73.67
2  BS 100.17 104.5  97.6 103.6  91.7 106.2  85.5 73.66
3  DG 101.00 102.5  98.9 101.1  91.2 106.2  80.9 75.67
4  IC  97.80 103.4  97.2 102.4  88.4 103.3  85.7 70.00
5  DJ 106.20 103.1  99.1  97.7  90.7 106.2  77.5 74.00
6  GJ  97.47 101.7  98.6 101.2  89.9 105.6  81.7 73.33
7  US  99.80 105.6  98.2   0.0  81.7 103.6  84.3 68.00
8  GG  98.13 105.7  98.6 103.7  92.2 105.2  85.9 73.66
9  GO  96.13 101.2  96.8 101.7  86.4 105.7  78.1 72.66
10 CB 104.20 105.2 101.5 100.3  88.3 106.2  78.8 72.00
11 CN 107.20  95.0  96.1  98.7  88.2 103.7  78.5 71.33
12 GB  98.87 102.0  95.3 100.2  87.2 104.2  78.5 70.33
13 GN  99.57 103.3  95.6 102.6  89.2 103.7  83.2 72.00
14 JB  99.60  96.2  98.2  96.2  86.2 101.7  84.5 71.34
15 JN  93.83  98.6  98.8  95.2  87.2 102.7  83.9 70.33
16 JJ  93.63 101.7  93.2  98.1   0.0   0.0  83.9 71.00
17 SJ   0.00   0.0   0.0   0.0   0.0 106.5  81.9 73.34

这是每年在韩国一些省份进行的测试分数。 截至2013年,测试分数的边界为[0,110],但2014年更改为[0,100]。

我的目标是将测试分数标准化为某个边界或希望某些标准化区域。

也许,我可以先将2008年和2013年的分数转换为100%的比例,然后减去列的均值并除以每列的标准偏差来实现这一目标。但是,那只是每列中的标准化。

是否有任何可能的方法来统一(或标准化)整个测试分数?

顺便说一下,测试分数0表示没有测试,因此在规范化过程中必须忽略它。而且,这是csv格式,为了您的方便..

,2008,2009,2010,2011,2012,2013,2014,2015
SU,103.27,105.2,99.7,106.7,96.7,108.4,88.7,73.67
BS,100.17,104.5,97.6,103.6,91.7,106.2,85.5,73.66
DG,101,102.5,98.9,101.1,91.2,106.2,80.9,75.67
IC,97.8,103.4,97.2,102.4,88.4,103.3,85.7,70
DJ,106.2,103.1,99.1,97.7,90.7,106.2,77.5,74
GJ,97.47,101.7,98.6,101.2,89.9,105.6,81.7,73.33
US,99.8,105.6,98.2,0,81.7,103.6,84.3,68
GG,98.13,105.7,98.6,103.7,92.2,105.2,85.9,73.66
GO,96.13,101.2,96.8,101.7,86.4,105.7,78.1,72.66
CB,104.2,105.2,101.5,100.3,88.3,106.2,78.8,72
CN,107.2,95,96.1,98.7,88.2,103.7,78.5,71.33
GB,98.87,102,95.3,100.2,87.2,104.2,78.5,70.33
GN,99.57,103.3,95.6,102.6,89.2,103.7,83.2,72
JB,99.6,96.2,98.2,96.2,86.2,101.7,84.5,71.34
JN,93.83,98.6,98.8,95.2,87.2,102.7,83.9,70.33
JJ,93.63,101.7,93.2,98.1,0,0,83.9,71
SJ,0,0,0,0,0,106.5,81.9,73.34 

1 个答案:

答案 0 :(得分:4)

我认为最好的可能是将第2列转换为第6列,即将[0-110]范围内的列转换为[0-100]的范围。通过这种方式,一切都将达到相同的规模。为了做到这一点:

数据:

df <- read.table(header=T, text='    X  X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015
1  SU 103.27 105.2  99.7 106.7  96.7 108.4  88.7 73.67
2  BS 100.17 104.5  97.6 103.6  91.7 106.2  85.5 73.66
3  DG 101.00 102.5  98.9 101.1  91.2 106.2  80.9 75.67
4  IC  97.80 103.4  97.2 102.4  88.4 103.3  85.7 70.00
5  DJ 106.20 103.1  99.1  97.7  90.7 106.2  77.5 74.00
6  GJ  97.47 101.7  98.6 101.2  89.9 105.6  81.7 73.33
7  US  99.80 105.6  98.2   0.0  81.7 103.6  84.3 68.00
8  GG  98.13 105.7  98.6 103.7  92.2 105.2  85.9 73.66
9  GO  96.13 101.2  96.8 101.7  86.4 105.7  78.1 72.66
10 CB 104.20 105.2 101.5 100.3  88.3 106.2  78.8 72.00
11 CN 107.20  95.0  96.1  98.7  88.2 103.7  78.5 71.33
12 GB  98.87 102.0  95.3 100.2  87.2 104.2  78.5 70.33
13 GN  99.57 103.3  95.6 102.6  89.2 103.7  83.2 72.00
14 JB  99.60  96.2  98.2  96.2  86.2 101.7  84.5 71.34
15 JN  93.83  98.6  98.8  95.2  87.2 102.7  83.9 70.33
16 JJ  93.63 101.7  93.2  98.1   0.0   0.0  83.9 71.00
17 SJ   0.00   0.0   0.0   0.0   0.0 106.5  81.9 73.34')

你可以这样做:

df[2:6] <- lapply(df[2:6], function(x) {
   x / 110 * 100 
})

基本上你除以120,这是[0-110]中的最大值,以便转换到[0-1]之间的范围,然后乘以100来转换[0-100]之间的范围。< / p>

输出:

> df
    X    X2008    X2009    X2010    X2011    X2012 X2013 X2014 X2015
1  SU 93.88182 95.63636 90.63636 97.00000 87.90909 108.4  88.7 73.67
2  BS 91.06364 95.00000 88.72727 94.18182 83.36364 106.2  85.5 73.66
3  DG 91.81818 93.18182 89.90909 91.90909 82.90909 106.2  80.9 75.67
4  IC 88.90909 94.00000 88.36364 93.09091 80.36364 103.3  85.7 70.00
5  DJ 96.54545 93.72727 90.09091 88.81818 82.45455 106.2  77.5 74.00
6  GJ 88.60909 92.45455 89.63636 92.00000 81.72727 105.6  81.7 73.33
7  US 90.72727 96.00000 89.27273  0.00000 74.27273 103.6  84.3 68.00
8  GG 89.20909 96.09091 89.63636 94.27273 83.81818 105.2  85.9 73.66
9  GO 87.39091 92.00000 88.00000 92.45455 78.54545 105.7  78.1 72.66
10 CB 94.72727 95.63636 92.27273 91.18182 80.27273 106.2  78.8 72.00
11 CN 97.45455 86.36364 87.36364 89.72727 80.18182 103.7  78.5 71.33
12 GB 89.88182 92.72727 86.63636 91.09091 79.27273 104.2  78.5 70.33
13 GN 90.51818 93.90909 86.90909 93.27273 81.09091 103.7  83.2 72.00
14 JB 90.54545 87.45455 89.27273 87.45455 78.36364 101.7  84.5 71.34
15 JN 85.30000 89.63636 89.81818 86.54545 79.27273 102.7  83.9 70.33
16 JJ 85.11818 92.45455 84.72727 89.18182  0.00000   0.0  83.9 71.00
17 SJ  0.00000  0.00000  0.00000  0.00000  0.00000 106.5  81.9 73.34

现在你可以比较这些年份。另外,正如您将注意到零将保持为零。