Question

我有一个SQLAlchemy模型和一个包含很少记录的pandas数据框，这些记录应该被加载到由该sqlalchemy模型表示的表中。但在加载之前，我需要检查数据帧中的所有行是否满足'UniqueConstraint'

我的模型和数据框如下：

型号：

class Flight(Base):
__tablename__ = 'flight'

flight_id = Column(Integer)
from_location = Column(String)
to_location = Column(String)
schedule = Column(String)
__table_args__ = (UniqueConstraint('flight_id', 'schedule', name='flight_schedule'),)

数据框：

flight_id | from_location  | to_location |  schedule |  
   1      |   Vancouver    |   Toronto   |   3-Jan   |  
   2      |   Amsterdam    |   Tokyo     |   15-Feb  |  
   4      |   Fairbanks    |   Glasgow   |   12-Jan  |  
   9      |   Halmstad     |   Athens    |   21-Jan  |  
   3      |   Brisbane     |   Lisbon    |   4-Feb   |  
   4      | Johannesburg   |   Venice    |   12-Jan  |

在这种情况下，checker函数应该返回false作为3rd＆amp;数据帧中的第6条记录违反了uniqueconstraint（同一航班不能同时安排在2条不同的路线上）。关于如何做的任何提示/解决方案？

Answer 1

对于DataFrame.duplicated指定列，我认为需要any才能检查至少一个True：

print (df.duplicated(['flight_id', 'schedule']).any())
True

<强>详细：

print (df.duplicated(['flight_id', 'schedule']))
0    False
1    False
2    False
3    False
4    False
5     True
dtype: bool

如果您需要过滤有问题的行，请使用boolean indexing和参数keep=False返回所有欺骗行：

print (df[df.duplicated(['flight_id', 'schedule'], keep=False)])
   flight_id from_location to_location schedule
2          4     Fairbanks     Glasgow   12-Jan
5          4  Johannesburg      Venice   12-Jan

<强>详细：

print (df.duplicated(['flight_id', 'schedule'], keep=False))
0    False
1    False
2     True
3    False
4    False
5     True
dtype: bool

Answer 2

IIUC duplicated

df.duplicated('flight_id',keep=False)
Out[473]: 
0    False
1    False
2     True
3    False
4    False
5     True
dtype: bool

或使用groupby

df.groupby('flight_id').transform('nunique').gt(1).any(1)
Out[482]: 
0    False
1    False
2     True
3    False
4    False
5     True
dtype: bool

根据sqlalchemy模型的唯一约束检查数据框记录

2 个答案:

根据sqlalchemy模型的唯一约束检查​​数据框记录

2 个答案:

根据sqlalchemy模型的唯一约束检查数据框记录