Question

我正在将我的SQLAlchemy映射的星型模式直接查询到pandas DataFrame，并且我正在从pandas获得一个恼人的SAWarning，我想解决这个问题。这是一个简化版本。

class School(Base):
__tablename__ = 'DimSchool'

id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)


class StudentScore(Base):
__tablename__ = 'FactStudentScore'

StudentKey = Column('StudentKey', Integer,    ForeignKey('DimStudent.StudentKey'), primary_key = True)
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)

student = relationship("Student", backref='studentscore')
school = relationship("School", backref='studentscore')

我用这样的语句查询日期：

standard = session.query(StudentdScore, School).\
join(School).filter(School.name.like('%Dever%'))

testdf = pd.read_sql(sch.statement, sch.session.bind)

然后得到这个警告：

SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key.  Consider use_labels for select() statements.

我的联接中包含的每个附加表（类）都会出现此错误。该消息始终引用外键。

其他人遇到此错误并确定根本原因？或者你也会忽略它？

EDIT / UPDATE：

Handling Duplicate Columns in Pandas DataFrame constructor from SQLAlchemy Join

这些人似乎在谈论一个相关的问题，但是他们使用不同的pandas方法来引入数据帧，并希望保留重复数据，而不是丢弃它们。任何人都有关于如何实现类似样式函数的想法，但在查询返回时删除重复项？

Answer 1

对于它的价值，这是我有限的答案。

对于以下SAWarning：

SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.
Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key.  
Consider use_labels for select() statements.

它确实告诉您，即使列位于不同的表中，也存在具有重复名称的列。在大多数情况下，这是无害的，因为连接键的列很简单。但是，我遇到过这样的情况：表格包含由不同的填充列重复命名的列表（即具有名称列的教师表和具有名称列的学生表）。在这些情况下，使用类似this的方法重命名pandas数据帧，或重命名基础数据库表。

我会密切留意这个问题，如果有人有更好的问题，我很乐意给出答案。

使用SQLAlchemy查询到pandas df

1 个答案: