我需要在数据库中设置两个表,我正在努力决定如何在SQL Alchemy中设计表。
表1包含原始地址数据和地址来源。如果原始地址来自不同来源,则可能会出现不止一次。
表2包含这些地址的地理编码版本。每个地址只出现一次。地址应仅出现在此表中,如果它们在表1中至少出现一次
当新地址进入系统时,它们将首先插入到表1中。然后,我将有一个脚本,查找表1中不在表2中的记录,对它们进行地理编码并将它们插入表2中。 / p>
我有以下代码:
class RawAddress(Base):
__tablename__ = 'rawaddresses'
id = Column(Integer,primary_key = True)
source_of_address = Column(String(50))
#Want something like a foreign key here, but address may not yet exist
#in geocoded address table
full_address = Column(String(400))
class GeocodedAddress(Base):
__tablename__ = 'geocodedaddresses'
full_address = Column(String(400), primary_key = True)
lat = Column(Float)
lng = Column(Float)
有没有办法在SQL Alchemy中建立full_address字段之间的关系?或许我的设计错了 - 也许每当我看到一个新的原始地址时,我应该将它添加到GeocodedAddress表中,并带有一个标记,说明它是否进行了地理编码?
非常感谢您对此的任何帮助。
答案 0 :(得分:1)
考虑到您的评论,允许此类数据存储以及插入/更新过程的代码应该可以胜任。之前几点评论:
backref
代码:
# Model definitions
class RawAddress(Base):
__tablename__ = 'rawaddresses'
id = Column(Integer, primary_key=True)
source_of_address = Column(String(50))
full_address = Column(
ForeignKey('geocodedaddresses.full_address'),
nullable=True,
)
class GeocodedAddress(Base):
__tablename__ = 'geocodedaddresses'
full_address = Column(String(400), primary_key=True)
lat = Column(Float)
lng = Column(Float)
raw_addresses = relationship(RawAddress, backref="geocoded_address")
现在:
# logic
def get_geo(full_address):
" Dummy function which fakes `full_address` and get lat/lng using hash(). "
hs = hash(full_address)
return (hs >> 8) & 0xff, hs & 0xff
def add_test_data(addresses):
with session.begin():
for fa in addresses:
session.add(RawAddress(full_address=fa))
def add_geo_info():
with session.begin():
q = (session
.query(RawAddress)
.filter(~RawAddress.geocoded_address.has())
)
for ra in q.all():
print("Computing geo for: {}".format(ra))
lat, lng = get_geo(ra.full_address)
ra.geocoded_address = GeocodedAddress(
full_address=ra.full_address, lat=lat, lng=lng)
和一些测试:
# step-1: add some raw addresses
add_test_data(['Paris', 'somewhere in Nevada'])
print("-"*80)
# step-2: get those raw which do not have geo
add_geo_info()
print("-"*80)
# step-1: again with 1 new, 1 same
add_test_data(['Paris', 'somewhere in Chicago'])
print("-"*80)
# step-2: get those raw which do not have geo
add_geo_info()
print("-"*80)
# check: print all data for Paris geo:
gp = session.query(GeocodedAddress).filter(GeocodedAddress.full_address == 'Paris').one()
assert 2 == len(gp.raw_addresses)
print(gp.raw_addresses)