我正在尝试使用rowSds()
计算每行的标准偏差,以便我可以选择具有高sds的行来绘制图形。
我的数据框称为xx
,如下所示:
head(xx,1)
Job variable 2012-02-23 2012-02-24 2012-02-25 2012-02-27 2012-02-28 2012-02-29 2012-03-01 2012-03-02 2012-03-03 2012-03-05 2012-03-06 2012-03-07 2012-03-08 2012-03-09 2012-03-10 2012-03-12 2012-03-13 2012-03-14
1 A Duration 152 424 NA 499 320 117 211 363 NA 605 76 309 204 185 NA 25 733 500
2012-03-15 2012-03-16 2012-03-17 2012-03-19 2012-03-20 2012-03-21 2012-03-22 2012-03-23 2012-03-24 2012-03-26 2012-03-27 2012-03-28 2012-03-29 2012-03-30 2012-03-31 2012-04-02 2012-04-03 2012-04-04 2012-04-05 2012-04-06
1 521 601 NA 229 758 421 334 659 NA 419 423 444 289 594 NA 327 533 183 211 235
2012-04-07 2012-04-09 2012-04-10 2012-04-11 2012-04-12 2012-04-13 2012-04-14 2012-04-16 2012-04-17 2012-04-18 2012-04-19 2012-04-20 2012-04-21 2012-04-23 2012-04-24 2012-04-25 2012-04-26 2012-04-27 2012-04-28 2012-04-30
1 NA 225 419 236 218 188 NA 205 547 153 196 200 NA 259 257 208 302 244 NA 806
2012-05-01 2012-05-02 2012-05-03 2012-05-04 2012-05-05 2012-05-07 2012-05-08 2012-05-09 2012-05-10 2012-05-11 2012-05-12 2012-05-14 2012-05-15 2012-05-16 2012-05-17 2012-05-18 2012-05-19 2012-05-21 2012-05-22 2012-05-23
1 402 492 1078 440 NA 382 576 1105 511 368 NA 360 381 1152 718 353 NA 408 413 935
2012-05-24 2012-05-25 2012-05-26 2012-05-28 2012-05-29 2012-05-30 2012-05-31 2012-06-01 2012-06-02 2012-06-04 2012-06-05 2012-06-06 2012-06-07 2012-06-08 2012-06-09 2012-06-11 2012-06-12 2012-06-13 2012-06-14 2012-06-15
1 306 277 NA 253 367 977 557 432 NA 328 521 467 972 1556 NA 386 1394 401 857 857
2012-06-16 2012-06-18 2012-06-19 2012-06-20 2012-06-21 2012-06-22 2012-06-23 2012-06-25 2012-06-26 2012-06-27 2012-06-28 2012-06-29 2012-06-30 2012-07-02 2012-07-03 2012-07-04 2012-07-05 2012-07-06 2012-07-07 2012-07-09
1 NA 1056 324 329 327 325 NA 341 268 231 245 301 NA 283 365 297 310 260 NA 254
2012-07-10 2012-07-11 2012-07-12 2012-07-13 2012-07-14 2012-07-16 2012-07-17 2012-07-18 2012-07-19 2012-07-20 2012-07-21 2012-07-23 2012-07-24 2012-07-25 2012-07-26 2012-07-27 2012-07-28 2012-07-30 2012-07-31 2012-08-01
1 283 395 273 273 NA 278 243 210 356 267 NA 442 483 271 327 271 NA 716 598 577
2012-08-02 2012-08-03 2012-08-06 2012-08-07 2012-08-08 2012-08-09 2012-08-10 2012-08-13 2012-08-14 2012-08-15 2012-08-16 2012-08-17 2012-08-20 2012-08-21 2012-08-22 2012-08-23 2012-08-24 2012-08-27 2012-08-28 2012-08-29
1 345 403 318 522 333 259 404 244 240 288 245 22 738 530 390 648 294 403 381 724
2012-08-30 2012-08-31 2012-09-03 2012-09-04 2012-09-05 2012-09-06 2012-09-07 2012-09-10 2012-09-11 2012-09-12 2012-09-13 2012-09-14 2012-09-17 2012-09-18 2012-09-19 2012-09-20 2012-09-21 2012-09-24 2012-09-25 2012-09-26
1 740 575 558 785 883 501 901 500 285 174 562 1047 603 990 289 173 253 512 236 278
2012-09-27 2012-09-28 2012-10-01 2012-10-02 2012-10-03 2012-10-04 2012-10-05 2012-10-08 2012-10-09 2012-10-10 2012-10-11 1 173 277 217 291 197 308 124 387 369 250 242
我正在尝试计算每行的标准偏差并确定sd列名称:
xx$sd<-rowSds(xx)
我收到此错误:
Error in apply(na.omit(as.matrix(x), ...), 1, FUN, ...) :
error in evaluating the argument 'X' in selecting a method for function 'apply': Error in na.omit(as.matrix(x), ...) :
error in evaluating the argument 'object' in selecting a method for function 'na.omit': Error in `colnames<-`(`*tmp*`, value = c("2012-02-23", "2012-02-24", "2012-02-25", :
length of 'dimnames' [2] not equal to array extent
任何想法在计算SD时如何省略NA
?我的语法是否正确?
答案 0 :(得分:28)
您可以使用apply
和transform
功能
set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))
transform(X, SD=apply(X,1, sd, na.rm = TRUE))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 SD
1 NA 12 17 18 19 16 12 13 20 14 3.041381
2 14 12 13 13 14 18 16 17 20 10 3.020302
3 11 19 NA 12 19 19 19 20 12 20 3.865805
4 10 11 20 12 15 17 18 17 18 12 3.496029
5 12 15 NA 14 20 18 16 11 14 18 2.958040
6 19 11 10 20 13 14 17 16 10 16 3.596294
7 14 16 17 15 10 11 15 15 11 16 2.449490
8 NA 10 15 19 19 12 15 15 19 14 3.201562
9 11 NA NA 20 20 14 14 17 14 19 3.356763
10 15 13 14 15 NA 13 15 NA 15 12 1.195229
从?apply
,您可以看到...
允许使用FUN的可选参数,在这种情况下,您可以使用na.rm=TRUE
来省略NA
值。
使用matrixStats包中的rowSds
还需要设置na.rm=TRUE
以省略NA
library(matrixStats)
transform(X, SD=rowSds(X, na.rm=TRUE)) # same result as before.
答案 1 :(得分:0)
set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))
vars_to_sum = grep("X", names(X), value=T)
X %>%
group_by(row_number()) %>%
do(data.frame(.,
SD = sd(unlist(.[vars_to_sum]), na.rm=T)))
...这会追加几个行号列,因此最好显式添加行ID进行分组。
X %>%
mutate(ID = row_number()) %>%
group_by(ID) %>%
do(data.frame(., SD = sd(unlist(.[vars_to_sum]), na.rm=T)))
此语法还具有能够指定要使用的列的功能。