为什么R中基于这两种方法的频率存在差异?

时间:2015-03-01 18:57:56

标签: r

我很好奇为什么即使使用相同的数据集,这两种方法(如下所示)产生的频率也存在差异。

第一种方法(cut(as.vector))

wd1<- apply(wd, 2, function(x) cut(((as.numeric(x)) 
                                    +  360/(16*2) )%% 360,seq(0,360,360/16) ,
                                   c('N', 'NNE', 'NE', 'ENE', 'E', 'ESE', '
                                     SE', 'SSE', 'S', 'SSW', 'SW', 'WSW',
                                     'W', 'WNW', 'NW', 'NNW')))
wd2<- as.data.frame(table(wd1))
wd3<- transform(wd2, cumFreq = cumsum(Freq), 
                relative = prop.table(Freq))

,这会产生

> wd3
    wd1  Freq cumFreq   relative
1  \nSE  2942    2942 0.01579292
2     E 11550   14492 0.06200144
3   ENE  5773   20265 0.03098998
4   ESE  5713   25978 0.03066790
5     N 11051   37029 0.05932276
6    NE  4725   41754 0.02536422
7   NNE  6196   47950 0.03326069
8   NNW 14880   62830 0.07987718
9    NW 18278   81108 0.09811795
10    S  6621   87729 0.03554212
11  SSE  3772   91501 0.02024844
12  SSW 10800  102301 0.05797537
13   SW 17004  119305 0.09127900
14    W 24903  144208 0.13368154
15  WNW 20603  164811 0.11059876
16  WSW 21475  186286 0.11527973

第二种方法(cut(wd,breaks=))

breaks1 <- apply(wd, 2, function(x) (cut(as.numeric(x), breaks=
                                              (seq(0,360,360/16)))))
breaks2<- as.data.frame(table(breaks1))

breaks3<- transform(breaks2, cumFreq = cumsum(Freq), 
                relative = prop.table(Freq))

,这会产生

> breaks3
     breaks1  Freq cumFreq   relative
1   (0,22.5]  8110    8110 0.04358036
2  (112,135]  3314   11424 0.01780830
3  (135,158]  3084   14508 0.01657236
4  (158,180]  5039   19547 0.02707786
5  (180,202]  8387   27934 0.04506886
6  (202,225] 14246   42180 0.07655312
7  (22.5,45]  5257   47437 0.02824932
8  (225,248] 19194   66631 0.10314198
9  (248,270] 24301   90932 0.13058525
10 (270,292] 22526  113458 0.12104700
11 (292,315] 19631  133089 0.10549027
12 (315,338] 16401  149490 0.08813335
13 (338,360] 13185  162675 0.07085167
14 (45,67.5]  4614  167289 0.02479405
15 (67.5,90]  9173  176462 0.04929256
16  (90,112]  9631  186093 0.05175369

总频率应该是186286作为第一个,但不是,我确定它省略了一些数字。间隔不是完全在22.5s(360/16应该表明),只有三个箱。嗯,他们是,但除了这三个外,R正在完成。这是为什么?

(dput)

dput(head(wd))
structure(list(X1000mb = c(86L, 130L, 75L, 59L, 56L, 69L), X925mb = c(70L, 
45L, 30L, 66L, 54L, 71L), X850mb = c(355L, 349L, 350L, 65L, 36L, 
56L), X700mb = c(331L, 342L, 329L, 35L, 1L, 44L), X600mb = c(328L, 
328L, 321L, 0L, 247L, 227L), X500mb = c(331L, 324L, 317L, 331L, 
251L, 241L), X400mb = c(340L, 328L, 310L, 296L, 261L, 246L), 
    X300mb = c(336L, 334L, 328L, 295L, 259L, 262L), X250mb = c(334L, 
    333L, 348L, 300L, 259L, 279L), X200mb = c(336L, 330L, 356L, 
    331L, 257L, 282L), X150mb = c(333L, 327L, 346L, 342L, 277L, 
    279L), X100mb = c(317L, 326L, 325L, 318L, 260L, 274L), X70mb = c(323L, 
    326L, 332L, 306L, 277L, 276L), X50mb = c(350L, 4L, 352L, 
    328L, 305L, 311L), X30mb = c(5L, 42L, 32L, 15L, 29L, 12L), 
    X20mb = c(3L, 42L, 48L, 30L, 46L, 45L), X10mb = c(28L, 25L, 
    4L, 14L, 104L, 76L)), .Names = c("X1000mb", "X925mb", "X850mb", 
"X700mb", "X600mb", "X500mb", "X400mb", "X300mb", "X250mb", "X200mb", 
"X150mb", "X100mb", "X70mb", "X50mb", "X30mb", "X20mb", "X10mb"
), row.names = c(NA, 6L), class = "data.frame")

0 个答案:

没有答案