如何找到落在R缓冲区内的点?

时间:2015-01-19 15:27:10

标签: r

我有两个从txt文件导入的数据帧 - 采样点和站点位置。

采样点数据框

X   Y   Z
346449.30   576369.65   86.93
346449.55   576368.24   87.16
346449.29   576368.17   79.08
346449.83   576366.86   88.23
346449.97   576365.42   84.97
346449.91   576362.22   86.59
346449.74   576363.65   88.87
346449.61   576363.59   84.99
346449.50   576363.54   81.33

电台位置数据框

Station x   y
1   346479.720  576349.710
2   346575.380  576361.530
3   346685.540  576303.180
4   346722.820  576412.680
5   346514.780  576406.140
6   346813.130  576435.830
7   346748.880  576304.090
8   346825.830  576402.800

所以我想知道如何找到并标记位于缓冲区内的点(来自采样数据帧)(例如,从第二个数据帧的每个站产生的3米缓冲半径)?

这就是我想得到的:

X   Y   Z   Station
346449.30   576369.65   86.93   1
346449.55   576368.24   87.16   1
346449.29   576368.17   79.08   1
346449.83   576366.86   88.23   2
346449.97   576365.42   84.97   2
346449.91   576362.22   86.59   3
346449.74   576363.65   88.87   4
346449.61   576363.59   84.99   5
346449.50   576363.54   81.33   5
346449.51   576365.07   89.38   5
346449.36   576365.01   84.93   5
346449.24   576366.46   88.70   5
346448.93   576367.83   86.75   5

我是R的新人,所以任何帮助表示赞赏。感谢。

1 个答案:

答案 0 :(得分:1)

如果您只是想在采样数据点的3米范围内将最近的站点的ID添加到您的采样数据。框架中将有一个解决方案:

# get a matrix with the squares of the euclidian distances
mx  <-  outer(seq(nrow(sampleData)),
              seq(nrow(stations)),
              # return the square of the euclidian distance
              function(i,j){
                  (sampleData[i,'X'] - stations[j,'x'])^2 + 
                  (sampleData[i,'Y'] - stations[j,'y'])^2
              })


# maximum distance to consider
d = 3

# get rid of distances greater than 3 meters away 
mx[mx>d^2] <- NA

index  <-  apply(mx,
                 1,
                 # returns the number of the nearest row in `stations` that is less than 3 meters away
                 function(x){
                     if(all(is.na(x)))
                         return(NA)
                     x[is.na(x)] <- F
                     which.max( x == min(x,na.rm=T) )
                 })

sampleData$station <- stations$station[indx]

# a comma delimited list of stations with distance < 3
sampleData$closeStations  <-  apply(mx,
                 1,
                 # returns the number of the nearest row in `stations` that is less than 3 meters away
                 function(x){
                     if(all(is.na(x)))
                         return(NA)
                     paste0(stations$Station[x],sep = ',')
                 })

使用outerapply可能会使解决方案运行得更快,但如果您遇到问题,则可能更容易使用for循环进行调试:

# maximum distance to consider
d = 3

distanceToNearestStation <- 
nearestStation <- numeric(0)
nearestStations <- character(0)
for(i in seq(nrow(sampleData))){

    # square of the euclidian distances from this data point to the stations
    distances <- sqrt((sampleData[i,'X'] - stations[,'x'])^2 + 
                  (sampleData[i,'Y'] - stations[,'y'])^2 )

    # get rid of distances greater than 3 meters away 
    # distances[distances>d] <- NA

    # all the stations are too far away or something is wrong with this data point
    if(all(is.na(distances)))
        next

    # record the nearest station to this data point
    distanceToNearestStation[i] <- min(distances,na.rm=T)
    nearestStation[i] <- which.max( distances == min(distances,na.rm=T) )

    # comma delimeted list of stations within 3 meters
    distanceIsClose <- distance < 3
    distanceIsClose[is.na(distanceIsClose)] <- F

    nearestStations[i] <- paste0(paste0(stations$Station[distanceIsClose],sep = ','))
}

range(distanceToNearestStation)

sampleData$station <- stations$station[nearestStation]

# number of data points within 3 meters of a station
table(distanceToNearestStation <= 3)

# data points within 3 meters of a station
subset <- sampleData[distanceToNearestStation<= 3,]

# save to individual files. 
for(s in unique(subset$station))
    write.csv(subset[subset$station == s,],
              file.path('My/Favorite/Directory'# note there is no trailing slash
                        ,paste('station',s,'data.csv')))