我在sql数据库中有三个数据库,如下所示:
D:经销商的位置
master
O:销售记录
dealer: zip: affiliate:
AAA 32313 Larry
BBB 32322 John
Z:Zip距离数据库
customer: affiliate: zip: count:
John's Construction Larry 35331 3
Bill's Sales John 12424 300
Jim's Searching Larry 14422 32
我正在尝试查看数据库D(经销商列表及其位置),并了解他们的估计销售额。我这样做是通过使用数据库O,它显示了对客户的所有销售,以及他们的位置。我们正在使用的逻辑是,对于每个经销商,查看数据库O并找到最小化距离的zip。我们将假设最接近销售的经销商是进行销售的经销商。
我在设置SQL查询时遇到了很多麻烦,我想知道SQL是否是正确的地方。我知道一个小蟒蛇,以及大量的R.任何帮助都是值得赞赏的。
我目前使用的查询:
zip1: zip2: dist:
35235 35235 20
32355 15553 14
答案 0 :(得分:0)
我已修改您的测试数据以测试R中的sql查询。我在R中使用了sqldf
库。
## Your Modified Test Data
LocationOfDealers <- data.frame(dealer = c("AAA", "BBB", "CCC"), zip = c(32313, 32322, 35235), affiliate = c("Larry", "John", "Larry"))
SalesRecord <- data.frame(customer=c("John's Construction", "Bill's Sales", "Jim's Searching", "Tim's Sales"), affiliate = c("Larry", "John", "Larry", "James"), zip = c(35331, 12424, 14422, 35235), count = c(3, 300, 32, 20))
ZipDistance <- data.frame(zip1=c(35235, 32355), zip2=c(35235, 15553), dist = c(20, 14))
#LocationOfDealers
# dealer zip affiliate
#1 AAA 32313 Larry
#2 BBB 32322 John
#3 CCC 35235 Larry
# SalesRecord
# customer affiliate zip count
# 1 John's Construction Larry 35331 3
# 2 Bill's Sales John 12424 300
# 3 Jim's Searching Larry 14422 32
# 4 Tim's Sales James 35235 20
# ZipDistance
# zip1 zip2 dist
# 1 35235 35235 20
# 2 32355 15553 14
## Sql query in R using sqldf
library(sqldf)
sqldf({"
SELECT dealer, MIN(dist) as Min_Dist, SUM(count) as dealer_Sold FROM (
SELECT *
FROM LocationOfDealers D
INNER JOIN ZipDistance Z on
D.zip = Z.zip1
INNER JOIN SalesRecord O on
O.zip = Z.zip2) GROUP BY dealer
"})
### There is only one dealer with common Zip between customer and dealers, and its min distance is 20
# dealer Min_Dist dealer_Sold
#1 CCC 20 20