Question

我正在尝试创建一个新数据集，用于删除ds2中的某些行（通过与数据集ds1进行比较）。我写了一个应该这样做的函数：

compare<-function(ds1,ds2){
for(i in 1:length(ds1$long)){
    for(j in 1:length(ds2$long)){
        if(ds1$long[i]<(ds2$long[j]+500) & ds1$long[i]>(ds2$long[j]-500)){
            if(ds1$lat[i]<(ds2$lat[j]+500) & ds1$lat[i]>(ds2$lat[j]-500)){
                ds3<-data.frame(merge(ds2[j,],ds3))
            }
        }
    }
}
return(ds3) 
}

ds3是我想要返回的数据集，它应该由满足上述条件的原始数据集ds2的行组成。我的功能给了我一个错误：

Error in as.data.frame(y) : 
argument "y"  is not specified and has not a definite value

“merge（）”是否是创建此类数据集的正确函数，将行追加到ds3？如果没有，这是正确的功能吗？

提前谢谢大家

编辑：我根据您的提示修改了该功能，使用

ds3<-data.frame()
ds3<-rbind(ds3,ds2[j,])

而不是

ds3<-data.frame(merge(ds2[j,],ds3))

现在我遇到了这个错误：

Errore in rbind(ds3, ds2[j, ]) : 
no method for coercing this S4 class to a vector

如果我使用rbind（），我可以使用SpatialPoints吗？（我的数据集中包含的数据是空间点）

Edit2：我有2个数据集，一个有330行（不规则网格上的点，ds1），一个有~150000行（常规网格上的点，ds2）。我想计算第一个数据集中的变量和第二个数据集中的变量之间的相关性。为了实现它，我想将第二个数据集“减少”到第一个数据集的维度，只保存两个数据集中具有相同坐标（或准）的点。

Answer 1

没有一个小例子，这没有测试，但如果你对for循环的性能感到满意，那么这可能是你正在尝试的：

compare<-function(ds1,ds2){
for(i in 1:length(ds1$long)){
    for(j in i:length(ds2$long)){   # I think starting at 1 will give twice as many hits
        if(ds1$long[i]<(ds2$long[j]+500) & ds1$long[i]>(ds2$long[j]-500)){
            if(ds1$lat[i]<(ds2$lat[j]+500) & ds1$lat[i]>(ds2$lat[j]-500)){
               if( length(d3) ) { # check to see if d3 exists or not
                ds3<-rbind( ds3, ds2[,j] ) } else {  # append as the next row
                d3 <- ds2[ ,j] }   # should only get executed once
            }
        }
    }
}
return(ds3) 
}

我试图避免重新测试j的额外开销，我匹配已经有i，j匹配的地方。同样，我无法确定这是否合适，因为问题描述对我来说仍然不是很清楚。

将行附加到R中的数据集

1 个答案: