映射:单个地址ID可以具有不同的跟踪ID。每个跟踪ID和每个地址ID将具有不同的经纬度对。每个跟踪ID可以具有多个路由ID,尽管在大多数情况下,它是跟踪ID映射的单个路由ID。
更新:我从T1_2中选择的跟踪ID在其他表中可能存在也可能不存在。另外,用于最终选择语句的每个临时表都没有重复项(基于键值)。
我对以下查询的结果有疑问。该查询应该为传递点与地址的距离偏差生成度量。它在列上执行一些交叉联接,因此数据比应有的更多。我知道这与粒度有关,这是一个基本错误,但对我来说很难找到错误之处。如果有人可以给我一些指示,请做。结果的一个子集已作为链接附加,并且我还强调了一个示例跟踪ID,该跟踪ID应该只出现一次(仅包含路由ID)。结果中应包含重复多次的地址ID,其中包含不重复的tracking_id。 turn应该与no_pkg列同步。该查询也随附以供参考。 Results subset
CREATE OR REPLACE FUNCTION f_stop_distance (Float, Float, Float, Float) /* This calculates distance in meters between two sets of lat and long */
RETURNS FLOAT
IMMUTABLE
AS $$
SELECT
2 * 6373000 * ASIN( SQRT( ( SIN( RADIANS(($3 - $1) / 2) ) ) ^ 2 + COS(RADIANS($1)) * COS(RADIANS($3)) * (SIN(RADIANS(($4 - $2) / 2))) ^ 2))
$$ LANGUAGE sql
;
CREATE TEMPORARY TABLE T1 AS /* This is to get top 1000 address ids which are unique identifiers for addresses in terms of orders frequency which is decided by number of distinct ordering order ids */
SELECT destination_address_id
,COUNT(DISTINCT ordering_order_id)a
,COUNT(DISTINCT tracking_id) no_pkg
FROM lmaa_pm.perfectmile_onroad_events_na
where shipment_status = 'DELIVERED'
AND delivery_station_code = 'DCH1'
AND event_day BETWEEN '2018-12-01' AND '2018-12-31'
AND tracking_id IS NOT NULL
GROUP BY destination_address_id,delivery_station_code
ORDER BY a DESC
LIMIT 1000
;
CREATE TEMPORARY TABLE T1_2 AS /* This is to get tracking ids corresponding to those top 1000 address ids */
SELECT DISTINCT destination_address_id
,tracking_id
FROM lmaa_pm.perfectmile_onroad_events_na
WHERE destination_address_id IN (SELECT destination_address_id FROM T1)
AND event_day BETWEEN '2018-12-01' AND '2018-12-31'
AND shipment_status = 'DELIVERED'
AND delivery_station_code = 'DCH1'
AND tracking_id IS NOT NULL
GROUP BY 1,2
;
CREATE TEMPORARY TABLE T2 AS /* This is to get lat long pairs for addresses and delivery point respectively */
SELECT DISTINCT gdd.lat1
,gdd.long1
,gdd.external_address_id destination_address_id
,gdd.tracking_id
,gdd.actual_lat
,gdd.actual_long
,ROW_NUMBER() OVER(PARTITION BY tracking_id ORDER BY deliverydate DESC) rn /* This is to avoid duplicates since this table contains duplicates */
FROM gtech.geocoding_data_daily_na gdd
WHERE gdd.shipment_status_id in (51,'DELIVERED')
AND tracking_id IN(SELECT tracking_id FROM T1_2)
AND confidence1 = 'high'
AND gdd.station_code='DCH1'
AND deliverydate BETWEEN '2018-12-01' AND '2018-12-31'
AND actual_lat IS NOT NULL
AND actual_long IS NOT NULL
;
CREATE TEMPORARY TABLE T2_2 AS
SELECT *
FROM T2
WHERE rn = 1
;
CREATE TEMPORARY TABLE T3 AS
SELECT T2_2.lat1
,T2_2.long1
,T2_2.actual_lat
,T2_2.actual_long
,T2_2.tracking_id
,T2_2.destination_address_id
,CASE /* This function is for identifying distance deviations in the order of 0 - 10 metres, 10-20 metres and so on */
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) <=10 THEN '0_to_10'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) >10
and f_stop_distance(lat1,long1,actual_lat,actual_long) <=20 THEN '10_to_20'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long)>20
and f_stop_distance(lat1,long1,actual_lat,actual_long) <=50 THEN '20_to_50'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) >50 THEN 'gt_50'
END AS Dev_from_address
FROM T2_2
ORDER BY T2_2.tracking_id
;
CREATE TEMPORARY TABLE T4 AS /* Doing some percentage calculations based on the new buckets created in the previous temp table namely percentage calculations out of total */
SELECT SUM(CASE WHEN Dev_from_address = '0_to_10' THEN 1 ELSE 0 END)a
,SUM(CASE WHEN Dev_from_address = '10_to_20' THEN 1 ELSE 0 END)b
,SUM(CASE WHEN Dev_from_address = '20_to_50' THEN 1 ELSE 0 END)c
,SUM(CASE WHEN Dev_from_address = 'gt_50' THEN 1 ELSE 0 END)d
,tracking_id
,(a/(a+b+c+d)::DECIMAL(10,2) * 100) AS e
,(b/(a+b+c+d)::DECIMAL(10,2) * 100) AS f
,(c/(a+b+c+d)::DECIMAL(10,2) * 100) AS g
,(d/(a+b+c+d)::DECIMAL(10,2) * 100) AS h
FROM T3
GROUP BY tracking_id
;
CREATE TEMPORARY TABLE T5 AS /* adding info for route id to the existing data */
SELECT DISTINCT route_id
,tracking_id
,ROW_NUMBER() OVER (PARTITION BY tracking_id ORDER BY DATE DESC) rnnn /* to avoid duplicates */
FROM omw.route_actuals_na
WHERE tracking_id IN (SELECT tracking_id FROM T1_2)
AND stop_type = 'Dropoff'
AND scan_status = 'DELIVERED'
;
CREATE TEMPORARY TABLE T5_final AS
SELECT *
FROM T5
WHERE rnnn = 1
;
/* final select */
SELECT DISTINCT T1_2.destination_address_id
,T3.lat1
,T3.long1
,T3.actual_lat
,T3.actual_long
,T3.Dev_from_address
,T1_2.tracking_id
,T1.no_pkg
,T4.e
,T4.f
,T4.g
,T4.h
,T5_final.route_id
FROM T3
JOIN T4 ON T4.tracking_id = T3.tracking_id
JOIN T1 ON T1.destination_address_id = T3.destination_address_id
JOIN T1_2 ON T1_2.destination_address_id = T3.destination_address_id
JOIN T5_final ON T5_final.tracking_id = T3.tracking_id
ORDER BY T1_2.destination_address_id
答案 0 :(得分:0)
严格-那里没有完整的交叉联接-但是您可能有很多对很多的联接。 要对此进行跟踪,请尝试查看每个联接,以查看是否具有> 1的键值
select tracking_id,count(*) from t4 group by 1 having count(*) > 1;
select destination_address_id,count(*) from t1 group by 1 having count(*) > 1;
select tracking_id ,count(*) from t5_final group by 1 having count(*) > 1;
您返回的值可能是您的原因。这可以帮助您确定在哪里有多对多的加入。