R计算在大数据中循环

时间:2014-12-04 13:10:21

标签: arrays r calculated-columns

  k=seq(10100,249250621,10)

  a =data.frame(nrow=300000,ncol=5) #like this format:

  chr1 100000851 + 2 100000925

  chr1 100001273 + 3 100001347

..............................

1.现在我想计算:

对于每个a[i,5],搜索k[j]可能会在该时间间隔内生成a[i,5]

 (k[j]-75,k[j]+75)

然后合并new data.frame(),make a[i,6]=k[j]

2.我已经写了两个代码,但我不知道我错在哪里:

1)

     b=function(x){

     x1=a[which(a[,5]-(x-75)>0&a[,5]-(x+75)<0,]

     x2=cbind(x1,x)

   }

   c=apply(k,1,function(x)a(x))

  2)

    for(i in 1:length(k)){

      if(length(N1<-which(a[,5]-(k-75)>0&a[,5]-(k+75)<0))>0){

        for(j in N1){

           x1=cbind(k,a[j,])

           x2=rbind(x2,x1)

        }

       }

      }

但他们两个都错了。

任何提供建议的人都会非常感激!

1 个答案:

答案 0 :(得分:0)

从评论中我得到的结论如下:

#sample "a" and "k"
a = data.frame(col1 = paste("chr", rep(1:2, 2), sep = ""), col2 = sample(4),
               col3 = "+", col4 = 1:4, col5 = c(215, 502, 345, 1007))
a
k = c(200, 300, 400, 500, 800, 1000)

a[[6]] = sapply(a[[5]], function(x) 
                            paste(k[x >= (k - 75) & x <= (k + 75)], collapse = ", "))
a
#  col1 col2 col3 col4 col5       V6
#1 chr1    4    +    1  215      200
#2 chr2    2    +    2  502      500
#3 chr1    3    +    3  345 300, 400
#4 chr2    1    +    4 1007     1000