Question

我有一列数据，我从中获取50％的随机子样本。我正在进行双面ks测试，以比较dat50=dat[sample(nrow(dat),replace=F,size=0.50*nrow(dat)),] ks.test(dat[,1],dat50[,1], alternative="two.sided")数据与100％数据的分布，看看分布是否仍然非常合适。

为了达到我的目标，我希望将其作为1000的循环来运行，以获得1000个随机子样本的平均p值。这行代码为我的样本的50％的随机子集提供了单个p值：

x <- numeric(100)
for (i in 1:100){
  x<- ks.test(dat[,7],dat50[,7], alternative="two.sided")
  x<-x$p.value
}

我需要一行代码，运行1000次，每次在一列中保存得到的（不同的）p值，然后我可以平均。我试图开始工作的代码如下：

get.p.value <- function(df1, df2) {
  x <- rf(5, df1=df1, df2=df2)
  p.value <- ks.test(dat[,6],dat50[,6], alternative="two.sided")$p.value
}
replicate (2000, get.p.value(df1 = 5, df2 = 10))

然而，这不会存储多个p值

还试过这个：

###HTML FILE
##my controller has @event_types to autopopulate the values as well for edit action
<label for="events" class="control-label form-group col-md-12">Event Type: </label>   
  <div class="form_group col-md-12">
     <div class="btn-group">
          <%= select_tag("event_types", options_for_select(@event_types.pluck(:name),:multiple=>true,:required=>true) %>

      </div>       
  </div>  


###js FILE-initialise using id/class for multiselect
        $('#event_types').multiselect({
            enableFiltering: true,
            filterBehavior: 'text',
            enableCaseInsensitiveFiltering: true,
            nonSelectedText: 'Select the type of events'
        });

我希望这很清楚，我很感激任何解决这个问题的帮助！

Q

Answer 1

在for循环中，您在每次迭代中覆盖x，这意味着您只会保存最后一次迭代的p值。试试这个：

x <- numeric(100)
for (i in 1:length(x))
    x[i] <- ks.test(dat[,17], dat[sample(nrow(dat), replace=F, size=0.5*nrow(dat)),7])$p.value

您可以使用replicate获得相同的结果：

 replicate(100, ks.test(dat[,7], dat[sample(nrow(dat), replace=F, size=0.5*nrow(dat)),7])$p.value)

双边ks测试循环，得到p.value

1 个答案: