基于较少行中的缺失值删除多行 - 无法分配大小为

时间:2015-06-23 21:43:03

标签: r

我有一个R数据框,其中包含来自多个科目的数据,每个都经过多次测试。要对集合执行统计,主题(“id”)和每个观察的行(大约40,000)都有一个因子,每个变量大约有200个变量。

allData <- data.frame(id       = rep(1:4, 3),
                      session  = rep(1:3, each = 4),
                      measure1 = sample(c(NA, 1:11)),
                      measure2 = sample(c(NA, 1:11)),
                      measure3 = sample(c(NA, 1:11)),
                      measure4 = sample(c(NA, 1:11)))
allData                      
#    id session measure1 measure2 measure3 measure4
# 1   1       1        3        7       10        6
# 2   2       1        4        4        9        9
# 3   3       1        6        6        7       10
# 4   4       1        1        5        2        3
# 5   1       2       NA       NA        5       11
# 6   2       2        7       10        6        5
# 7   3       2        9        8        4        2
# 8   4       2        2        9        1        7
# 9   1       3        5        1        3        8
# 10  2       3        8        3        8        1
# 11  3       3       11       11       11        4
# 12  4       3       10        2       NA       NA

我需要删除id为1和4的所有行,因为“measureX”(X = 1,..,4)列在id 1和4的一行中包含NA。

flodel在[https://stackoverflow.com/a/9917524/5042101][1]中使用“plyr”包和函数ddply提出了这个问题的解决方案。

probeColumns = c('measure1','measure4')

library(plyr)
ddply(allData, "id",
      function(df)if(any(is.na(df[, probeColumns]))) NULL else df)

问题。我的数据库包括大约40,000行和200列。当我尝试单个列时出现错误:C堆栈使用10027284。

我在Windows上的RStudio中使用R 3.1.3。当尝试更多列时,RStudio自动关闭或R冻结。此外,我无法访问计算机中的管理员会话。

2 个答案:

答案 0 :(得分:0)

我无法确切地说出plyr的问题(尽管它可能是包中的错误)。可以使用apply

执行此操作
> allData[apply(allData, 1, function(x) !any(is.na(x[probeColumns]))), ]
   id session measure1 measure2 measure3 measure4
1   1       1        1        1        2        4
2   2       1        5        4        6        1
3   3       1        9        8       NA        3
4   4       1       11        7        7        5
5   1       2        8        5       11        2
6   2       2        6       NA        5        8
7   3       2       10       10        3       10
9   1       3        4        9        4        9
10  2       3        2        6        8        7
11  3       3        3        3        9        6

一些解释 - apply(allData, c(1), function(x) !any(is.na(x[probeColumns])))通过逐行排列并检查是否存在NA来确定probeColumns指定的行中probeColums行的索引NA中连续的值为 <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Example of Bootstrap 3 Dynamic Tabs</title> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script> <script type='text/javascript'> $('#ProductDetailTabs a[data-toggle="tab"]').click(function (e) { e.preventDefault() $(this).tab('show') }); $('a[data-toggle="tab"]').on('shown.bs.tab', function (e) { var divId = $(e.target).attr("href") $('html,body').animate({ scrollTop: $(divId + 'Content').offset().top - 60 }, 500); }); </script> </head> <body> <nav class="navbar navbar-default"> <div class="container-fluid"> <!-- Brand and toggle get grouped for better mobile display --> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1" aria-expanded="false"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="#">Brand</a> </div> <!-- Collect the nav links, forms, and other content for toggling --> <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1"> <ul class="nav navbar-nav"> <li class="active"> <a data-toggle="tab" href="#sectionA" title="#sectionA" aria-controls="#sectionA">Section A</a> </li> <li> <a data-toggle="tab" href="#sectionB" title="#sectionB" aria-controls="#sectionB">Section B</a> </li> <li> <a data-toggle="tab" href="#sectionC" title="#sectionC" aria-controls="#sectionC">Section C</a> </li> </ul> </div><!-- /.navbar-collapse --> </div><!-- /.container-fluid --> </nav> <ul class="nav nav-tabs" id="ProductDetailTabs" style="padding-top:1000px;"> <li class="active"><a data-toggle="tab" href="#sectionA">Section A</a></li> <li><a data-toggle="tab" href="#sectionB">Section B</a></li> <li><a data-toggle="tab" href="#sectionC">Section C</a></li> </ul> <div class="tab-content" id="ProductDetailsContent"> <div id="sectionA" class="tab-pane fade in active"> <h3>Section A</h3> <p>Aliquip placeat salvia cillum iphone. Seitan aliquip quis cardigan american apparel, butcher voluptate nisi qui. Raw denim you probably haven't heard of them jean shorts Austin. Nesciunt tofu stumptown aliqua, retro synth master cleanse. Mustache cliche tempor, williamsburg carles vegan helvetica. Reprehenderit butcher retro keffiyeh dreamcatcher synth.</p> </div> <div id="sectionB" class="tab-pane fade"> <h3>Section B</h3> <p>Vestibulum nec erat eu nulla rhoncus fringilla ut non neque. Vivamus nibh urna, ornare id gravida ut, mollis a magna. Aliquam porttitor condimentum nisi, eu viverra ipsum porta ut. Nam hendrerit bibendum turpis, sed molestie mi fermentum id. Aenean volutpat velit sem. Sed consequat ante in rutrum convallis. Nunc facilisis leo at faucibus adipiscing.</p> </div> <div id="sectionC" class="tab-pane fade"> <h3>Section C</h3> <p>WInteger convallis, nulla in sollicitudin placerat, ligula enim auctor lectus, in mollis diam dolor at lorem. Sed bibendum nibh sit amet dictum feugiat. Vivamus arcu sem, cursus a feugiat ut, iaculis at erat. Donec vehicula at ligula vitae venenatis. Sed nunc nulla, vehicula non porttitor in, pharetra et dolor. Fusce nec velit velit. Pellentesque consectetur eros.</p> </div> </div> </body> </html>

答案 1 :(得分:0)

这可能是我的解决方案有点笨拙但是这里的想法是:

  1. 找出NA s
  2. 的位置
  3. 然后确定他们对应的id
  4. 最后一步删除至少拥有的所有id元素 (至少在一列中)NA

    ind <- allData[apply(allData, 1, function(x) sum(is.na(x))) == !0, 1 ]
    
    allData %>% filter(!id %in% ind)
      id session measure1 measure2 measure3 measure4
    1  1       1        1        6        1        8
    2  2       1       10        2        7        2
    3  1       2       11        7        5       11
    4  2       2        5        5        4        7
    5  1       3        4        8        9        5
    6  2       3        8       11        3        9