需要帮助。我正在尝试创建一个新列,列出使用纬度和经度在200米餐厅的餐馆数量。我在stackoverflow上找不到任何东西,而且我不是R忍者。任何帮助,将不胜感激!
头()
business_id restaurantType full_address open city
1 --5jkZ3-nUPZxUvtcbr8Uw Greek 1336 N Scottsdale Rd\nScottsdale, AZ 85257 1 Scottsdale
2 --BlvDO_RG2yElKu9XA1_g Sushi Bars 14870 N Northsight Blvd\nSte 103\nScottsdale, AZ 85260 1 Scottsdale
3 -_Ke8q969OAwEE_-U0qUjw Beer, Wine & Spirits 18555 N 59th Ave\nGlendale, AZ 85308 0 Glendale
4 -_npP9XdyzILAjtFfX8UAQ Vietnamese 6025 N 27th Avenue\nSte 24\nPhoenix, AZ 85073 1 Phoenix
5 -2xCV0XGD9NxfWaVwA1-DQ Pizza 9008 N 99th Ave\nPeoria, AZ 85345 1 Peoria
6 -3WVw1TNQbPBzaKCaQQ1AQ Chinese 302 E Flower St\nPhoenix, AZ 85012 1 Phoenix
review_count name longitude state stars latitude type categories1 categories2
1 11 George's Gyros Greek Grill -111.9269 AZ 4.5 33.46337 business Greek <NA>
2 37 Asian Island -111.8983 AZ 4.0 33.62146 business Sushi Bars Hawaiian
3 6 Jug 'n Barrel Wine Shop -112.1863 AZ 4.5 33.65387 business <NA> Beer, Wine & Spirits
4 15 Thao's Sandwiches -112.0739 AZ 3.0 33.44990 business Vietnamese Sandwiches
5 4 Nino's Pizzeria 2 -112.2766 AZ 4.0 33.56626 business Pizza <NA>
6 145 China Chili -112.0692 AZ 3.5 33.48585 business Chinese <NA>
avgStar duration delta
1 3.694030 381 0
2 3.661017 690 0
3 3.555556 604 1
4 3.577778 1916 0
5 3.482036 226 0
6 3.535928 2190 0
STR()
'data.frame': 2833 obs. of 28 variables:
$ business_id : Factor w/ 2833 levels "--5jkZ3-nUPZxUvtcbr8Uw",..: 1 2 3 4 5 6 7 8 9 10 ...
$ restaurantType: Factor w/ 118 levels "Afghan","African",..: 60 106 15 117 89 31 17 7 84 31 ...
$ full_address : Factor w/ 2586 levels "1 E Jackson St\nPhoenix, AZ 85004",..: 274 371 642 1825 2368 1102 1000 1143 2169 1669 ...
$ open : int 1 1 0 1 1 1 1 1 1 1 ...
$ city : Factor w/ 44 levels "Ahwatukee","Anthem",..: 34 34 19 31 30 31 34 4 18 31 ...
$ review_count : int 11 37 6 15 4 145 255 35 7 7 ...
$ name : Factor w/ 2652 levels "#1 Brother's Pizza",..: 885 127 1167 2318 1601 453 591 697 1492 1319 ...
$ longitude : num -112 -112 -112 -112 -112 ...
$ state : Factor w/ 2 levels "AZ","SC": 1 1 1 1 1 1 1 1 1 1 ...
$ stars : num 4.5 4 4.5 3 4 3.5 4.5 4 2.5 4.5 ...
$ latitude : num 33.5 33.6 33.7 33.4 33.6 ...
$ type : Factor w/ 1 level "business": 1 1 1 1 1 1 1 1 1 1 ...
$ categories1 : Factor w/ 103 levels "Afghan","African",..: 50 93 NA 102 78 26 14 7 73 26 ...
$ Freq : int 66 58 8 44 166 166 98 35 45 166 ...
$ avgRev : num 31.3 68.6 34.3 63.2 30.8 ...
$ avgStar : num 3.69 3.66 3.56 3.58 3.48 ...
$ duration : int 381 690 604 1916 226 2190 1968 1338 1606 56 ...
答案 0 :(得分:1)
一种方法是计算距离矩阵,然后找出足够接近的距离矩阵(这里我证明了在20公里以内,因此数字不是全部为0):
# Load the fields library
library(fields)
# Create a simple data frame to demonstrate (each row is a restaurant). The rdist.earth function
# we're about to call takes as input something where the first column is longitude and the second
# column is latitude.
df = data.frame(longitude=c(-111.9269, -111.8983, -112.1863, -112.0739, -112.2766, -112.0692),
latitude=c(33.46337, 33.62146, 33.65387, 33.44990, 33.56626, 33.48585))
# Let's compute the distance between each restaurant.
distances = rdist.earth(df, miles=F)
distances
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.00000 17.79813 32.07533 1.373515e+01 34.41932 1.344867e+01
# [2,] 17.79813 0.00000 26.93558 2.510519e+01 35.61413 2.189270e+01
# [3,] 32.07533 26.93558 0.00000 2.498676e+01 12.85352 2.162964e+01
# [4,] 13.73515 25.10519 24.98676 1.344145e-04 22.84310 4.025824e+00
# [5,] 34.41932 35.61413 12.85352 2.284310e+01 0.00000 2.122719e+01
# [6,] 13.44867 21.89270 21.62964 4.025824e+00 21.22719 9.504539e-05
# Compute the number of restaurants within 20 kilometers of the restaurant in each row.
df$num.close = colSums(distances <= 20) - 1
df$num.close
# [1] 3 1 1 2 1 2
答案 1 :(得分:1)
基础R和未经测试的代码,但你应该明白。
我基本上测试每个餐厅的圆形等式x2 + y2 <= R
内有多少行,除了该餐厅本身,并将其更新为列中的值。请注意,我方程中的半径是200,但它会有所不同,因为你的x,y是纬度,经度,你必须将200米的半径缩放到2pi radians / circumference of earth
或360 degree / circumference of earth
。
df <- data.frame(
latitude = runif(n=10,min=0,max=1000),
longitude = runif(n=10,min=0,max=1000)
)
for (i in seq(nrow(df)))
{
# circle's centre
xcentre <- df[i,'latitude']
ycentre <- df[i,'longitude']
# checking how many restaurants lie within 200 m of the above centre, noofcloserest column will contain this value
df[i,'noofcloserest'] <- sum(
(df[,'latitude'] - xcentre)^2 +
(df[,'longitude'] - ycentre)^2
<= 200^2
) - 1
# logging part for deeper analysis
cat(i,': ')
# this prints the true/false vector for which row is within the radius, and which row isn't
cat((df[,'latitude'] - xcentre)^2 +
(df[,'longitude'] - ycentre)^2
<= 200^2)
cat('\n')
}
输出 -
1 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
2 : FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 : FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
4 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
5 : FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
6 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
7 : FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
8 : FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
9 : FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
10 : FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
> df
latitude longitude noofcloserest
1 189.38878 270.25004 2
2 402.36853 879.26657 0
3 747.46417 581.66627 1
4 291.64303 157.75450 2
5 830.10699 736.19586 2
6 299.06803 157.76147 2
7 725.68360 58.53049 1
8 893.31904 772.46217 1
9 45.47875 701.82201 0
10 645.44772 226.95042 1
输出的含义是对于第1行的坐标,三行在200米以内。第1行本身,第4行和第6行。