我目前正在研究Maxmind数据库。但是,它包含数十万个重复条目。例如:
泰恩河畔纽卡斯尔:
369137 GB I7 Newcastle Upon Tyne NE20 54.9881 -1.6194
369332 GB I7 Newcastle Upon Tyne NE6 54.9881 -1.6194
369345 GB I7 Newcastle Upon Tyne NE13 54.9881 -1.6194
369355 GB I7 Newcastle Upon Tyne NE3 54.9881 -1.6194
369356 GB I7 Newcastle Upon Tyne NE5 54.9881 -1.6194
369645 GB I7 Newcastle Upon Tyne NE4 54.9881 -1.6194
369706 GB I7 Newcastle Upon Tyne NE15 54.9881 -1.6194
369959 GB I7 Newcastle Upon Tyne NE12 54.9881 -1.6194
370114 GB I7 Newcastle Upon Tyne NE27 54.9881 -1.6194
纽卡斯尔(我删除了其中一些,因为这里粘贴太多了):
382 ZA 2 Newcastle -27.758 29.9318
2279 US OK Newcastle 73065 35.2323 -97.6008
26459 US CA Newcastle 95658 38.873 -121.1543
22382 CA ON Newcastle l1b1j9 43.9167 -78.5833
38995 AU 2 Newcastle -32.9278 151.7845
40025 US ME Newcastle 4553 44.0438 -69.5675
47937 GB I7 Newcastle 54.9881 -1.6194
119830 US ME Newcastle 4553 44.0438 -69.5675
119982 US NE Newcastle 68757 42.6475 -96.9232
115052 US CA Newcastle 95658 38.873 -121.1543
120603 US NE Newcastle 68757 42.6475 -96.9232
127931 US OK Newcastle 73065 35.2323 -97.6008
136726 CA ON Newcastle 43.9167 -78.5833
136915 US TX Newcastle 76372 33.245 -98.9103
137128 US WY Newcastle 82701 43.8396 -104.5681
137130 US WY Newcastle 82701 43.8396 -104.5681
鉴于世界上有多个纽卡斯尔城市,并且它返回纽卡斯尔的所有不同邮政编码,即使它包含相同的纬度/经度,我们如何删除重复的条目?
我已经看过这个Eliminate duplicate cities from database,它提出了这个潜在的解决方案:
delete from climate.maxmind_city mc where id in (
select
max(c1.id)
from
climate.maxmind_city c1,
climate.maxmind_city c2
where
c1.id <> c2.id and
c1.country = c2.country and
c1.name = c2.name and
earth_distance(
ll_to_earth( c1.latitude_decimal, c1.longitude_decimal ),
ll_to_earth( c2.latitude_decimal, c2.longitude_decimal ) ) <= 35
group by
c1.country, c1.name
order by
c1.country, c1.name
)
但是,earth_distance是一个postgresql函数,我们正在使用MySQL。那么我如何用类似的MySQL方法替换earth_distance函数呢?