让我们说这是我的矩阵" marx" nrow = 400 ncol = 250。 我想从每一列(不包括NA)中选择一半数据(前50%)
V272 V273 V274 V275 V276 V277
[1,] 0.2337847 0.2612946 0.41232797 NA 0.11931570 0.2543780
[2,] 0.3277191 0.3590431 0.06490879 0.2690663 NA 0.1632647
[3,] NA 0.1536955 0.03604548 0.1361645 NA 0.2252554
[4,] 0.3483152 0.5342417 0.07404933 NA 0.14699876 0.2082977
[5,] 0.4213399 0.2511010 0.30502173 0.1189562 0.08962128 0.2919712
[6,] 0.1604953 0.2101048 NA NA 0.01270747 0.2322928
我尝试过sample = length(x)/ 2和循环,但仍然无效。有人有想法吗?
答案 0 :(得分:2)
我会这样做:
apply(x, 2, FUN = function(x) sort(x, decreasing = T)[1:floor(length(x)/2)])
演示:
set.seed(47)
x = matrix(rnorm(100), 10)
x[1, 3] = NA
x
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 1.99469634 -0.92245624 NA 0.4836041107 0.06116275 0.9697466 0.03838225 1.2174872
# [2,] 0.71114251 0.03960243 0.24914817 0.1443376363 -0.10856462 1.6756248 0.06893424 0.7314502
# [3,] 0.18540528 0.49382018 -0.34041599 -1.2004406274 -0.15469524 1.9882438 1.74017016 1.1339939
# [4,] -0.28176501 -1.82822917 0.41719084 0.8852306473 0.95048417 -0.9870583 1.30627664 2.1879180
# [5,] 0.10877555 0.09147291 -0.32646679 0.8869350447 -0.48769640 -1.8300307 -0.14493417 0.2212036
# [6,] -1.08573747 0.67077922 -0.89029402 0.0006863592 -0.92024188 1.0081416 1.56234731 -0.9390224
# [7,] -0.98548216 -0.08107805 -1.60815993 -0.6932373819 0.89797526 -0.8691044 1.24215371 0.8384429
# [8,] 0.01513086 1.26424109 -2.32237229 0.2608364805 -0.35629514 -0.5151981 1.46129302 0.5291967
# [9,] -0.25204590 -0.70338819 -1.96721918 0.5066869590 1.03190009 -0.5002165 -0.98583638 -1.0883085
# [10,] -1.46575030 -0.04057817 0.02752681 0.5643018376 0.66430042 -0.2725779 0.92561447 -0.7955874
# [,9] [,10]
# [1,] 0.96832400 1.136878023
# [2,] 0.18510415 0.004507257
# [3,] -0.41257000 1.341705472
# [4,] -0.83292772 -1.365424404
# [5,] 0.95488318 0.926037646
# [6,] -2.03609798 -0.497367640
# [7,] 0.07445361 -0.860184103
# [8,] -0.91453141 -0.060824754
# [9,] 0.15602420 1.410276163
# [10,] 0.02934662 0.003944793
apply(x, 2, FUN = function(x) sort(x, decreasing = T)[1:floor(length(x)/2)])
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1.99469634 1.26424109 0.41719084 0.8869350 1.03190009 1.9882438 1.740170 2.1879180 0.96832400
# [2,] 0.71114251 0.67077922 0.24914817 0.8852306 0.95048417 1.6756248 1.562347 1.2174872 0.95488318
# [3,] 0.18540528 0.49382018 0.02752681 0.5643018 0.89797526 1.0081416 1.461293 1.1339939 0.18510415
# [4,] 0.10877555 0.09147291 -0.32646679 0.5066870 0.66430042 0.9697466 1.306277 0.8384429 0.15602420
# [5,] 0.01513086 0.03960243 -0.34041599 0.4836041 0.06116275 -0.2725779 1.242154 0.7314502 0.07445361
# [,10]
# [1,] 1.410276163
# [2,] 1.341705472
# [3,] 1.136878023
# [4,] 0.926037646
# [5,] 0.004507257
修改仅返回一半的非NA值:
apply(x, 2, FUN = function(x) sort(x, decreasing = T)[1:floor(sum(!is.na(x))/2)])
这将返回一个列表,其中每个项目是每个原始列中非缺失值数量的一半长度(向下舍入)的向量。如果每列发生这个长度相同,它将被强制转换为矩阵,除非该长度为1,在这种情况下它将是一个向量。
答案 1 :(得分:0)
看看使用head()函数。
b <- data.frame(1:4, 2:8)
head(b, n = nrow(b/2))
虽然这不会删除你的NA,所以你可以
head(b[!is.na(b[,1]),1], n = nrow(b)/2)
迭代或使用apply函数。将b [,1]),1中的1更改为您的列。你会有一个参差不齐的数组,因为你的NA遍布每一列。
编辑:看到您的评论,您应该使用订单,即:
apply(b, 2, function(x) head(x[order(x, decreasing = TRUE)], n = length(x)/2))