挑选出每列中前50%的数据

时间:2016-04-28 22:15:21

标签: r matrix filter time-series

让我们说这是我的矩阵" marx" nrow = 400 ncol = 250。 我想从每一列(不包括NA)中选择一半数据(前50%)

          V272      V273       V274      V275       V276      V277
[1,] 0.2337847 0.2612946 0.41232797        NA 0.11931570 0.2543780
[2,] 0.3277191 0.3590431 0.06490879 0.2690663         NA 0.1632647
[3,]        NA 0.1536955 0.03604548 0.1361645         NA 0.2252554
[4,] 0.3483152 0.5342417 0.07404933        NA 0.14699876 0.2082977
[5,] 0.4213399 0.2511010 0.30502173 0.1189562 0.08962128 0.2919712
[6,] 0.1604953 0.2101048         NA        NA 0.01270747 0.2322928

我尝试过sample = length(x)/ 2和循环,但仍然无效。有人有想法吗?

2 个答案:

答案 0 :(得分:2)

我会这样做:

apply(x, 2, FUN = function(x) sort(x, decreasing = T)[1:floor(length(x)/2)])

演示:

set.seed(47)
x = matrix(rnorm(100), 10)
x[1, 3] = NA
x
#              [,1]        [,2]        [,3]          [,4]        [,5]       [,6]        [,7]       [,8]
#  [1,]  1.99469634 -0.92245624          NA  0.4836041107  0.06116275  0.9697466  0.03838225  1.2174872
#  [2,]  0.71114251  0.03960243  0.24914817  0.1443376363 -0.10856462  1.6756248  0.06893424  0.7314502
#  [3,]  0.18540528  0.49382018 -0.34041599 -1.2004406274 -0.15469524  1.9882438  1.74017016  1.1339939
#  [4,] -0.28176501 -1.82822917  0.41719084  0.8852306473  0.95048417 -0.9870583  1.30627664  2.1879180
#  [5,]  0.10877555  0.09147291 -0.32646679  0.8869350447 -0.48769640 -1.8300307 -0.14493417  0.2212036
#  [6,] -1.08573747  0.67077922 -0.89029402  0.0006863592 -0.92024188  1.0081416  1.56234731 -0.9390224
#  [7,] -0.98548216 -0.08107805 -1.60815993 -0.6932373819  0.89797526 -0.8691044  1.24215371  0.8384429
#  [8,]  0.01513086  1.26424109 -2.32237229  0.2608364805 -0.35629514 -0.5151981  1.46129302  0.5291967
#  [9,] -0.25204590 -0.70338819 -1.96721918  0.5066869590  1.03190009 -0.5002165 -0.98583638 -1.0883085
# [10,] -1.46575030 -0.04057817  0.02752681  0.5643018376  0.66430042 -0.2725779  0.92561447 -0.7955874
#              [,9]        [,10]
#  [1,]  0.96832400  1.136878023
#  [2,]  0.18510415  0.004507257
#  [3,] -0.41257000  1.341705472
#  [4,] -0.83292772 -1.365424404
#  [5,]  0.95488318  0.926037646
#  [6,] -2.03609798 -0.497367640
#  [7,]  0.07445361 -0.860184103
#  [8,] -0.91453141 -0.060824754
#  [9,]  0.15602420  1.410276163
# [10,]  0.02934662  0.003944793

apply(x, 2, FUN = function(x) sort(x, decreasing = T)[1:floor(length(x)/2)])
#            [,1]       [,2]        [,3]      [,4]       [,5]       [,6]     [,7]      [,8]       [,9]
# [1,] 1.99469634 1.26424109  0.41719084 0.8869350 1.03190009  1.9882438 1.740170 2.1879180 0.96832400
# [2,] 0.71114251 0.67077922  0.24914817 0.8852306 0.95048417  1.6756248 1.562347 1.2174872 0.95488318
# [3,] 0.18540528 0.49382018  0.02752681 0.5643018 0.89797526  1.0081416 1.461293 1.1339939 0.18510415
# [4,] 0.10877555 0.09147291 -0.32646679 0.5066870 0.66430042  0.9697466 1.306277 0.8384429 0.15602420
# [5,] 0.01513086 0.03960243 -0.34041599 0.4836041 0.06116275 -0.2725779 1.242154 0.7314502 0.07445361
#            [,10]
# [1,] 1.410276163
# [2,] 1.341705472
# [3,] 1.136878023
# [4,] 0.926037646
# [5,] 0.004507257

修改仅返回一半的非NA值:

apply(x, 2, FUN = function(x) sort(x, decreasing = T)[1:floor(sum(!is.na(x))/2)])

这将返回一个列表,其中每个项目是每个原始列中非缺失值数量的一半长度(向下舍入)的向量。如果每列发生这个长度相同,它将被强制转换为矩阵,除非该长度为1,在这种情况下它将是一个向量。

答案 1 :(得分:0)

看看使用head()函数。

b <- data.frame(1:4, 2:8)
head(b, n = nrow(b/2))

虽然这不会删除你的NA,所以你可以

head(b[!is.na(b[,1]),1], n = nrow(b)/2)

迭代或使用apply函数。将b [,1]),1中的1更改为您的列。你会有一个参差不齐的数组,因为你的NA遍布每一列。

编辑:看到您的评论,您应该使用订单,即:

apply(b, 2, function(x) head(x[order(x, decreasing = TRUE)], n = length(x)/2))