Postgres + Sqlalchemy重复计算结果

时间:2016-12-08 23:33:25

标签: python json postgresql flask sqlalchemy

我在sqlalchemy + postgres查询中遇到重复的结果。我的模型结构如下:

class Audio(db.Model):
    __tablename__ = "audio"

    id = db.Column(db.Integer, primary_key=True)
    file_location = db.Column(db.String(), unique=True)
    upload_time = db.Column(ArrowType, default=arrow.utcnow())
    keyword = db.Column(JSON)
    transcript = db.Column(JSON)
    diarization = db.Column(JSON)

    user_id = db.Column(db.Integer, db.ForeignKey('user.id', onupdate='cascade', ondelete='cascade'))
    company_id = db.Column(db.Integer, db.ForeignKey('company.id', onupdate='cascade', ondelete='cascade'))
    client_id = db.Column(db.Integer, db.ForeignKey('client.id', onupdate='cascade', ondelete='cascade'))

    def __init__(self, file_location, upload_time, keyword, transcript, diarization, user_id, company_id, client_id):
        self.file_location = file_location
        self.upload_time = upload_time
        self.keyword = keyword
        self.transcript = transcript
        self.diarization = diarization
        self.user_id = user_id
        self.company_id = company_id
        self.client_id = client_id

    def __repr__(self):
        return '<Audio ID %r>' % self.id

在记录下,输入是一个json对象:

{"transcript": [
    {"p": 0, "s": 0, "e": 320, "c": 0.545, "w": "This"}, 
    {"p": 1, "s": 320, "e": 620, "c": 0.825, "w": "call"}, 
    {"p": 2, "s": 620, "e": 780, "c": 0.909, "w": "is"}, 
    {"p": 3, "s": 780, "e": 1010, "c": 0.853, "w": "being"}
    ...
    ]}

我正在尝试根据&#34; w&#34;中的值来过滤条目。并查询其相应的Audio.id和&#34; p&#34;。我尝试过以下方法:

  transcript_subquery = s.query(func.json_array_elements(Audio.transcript['transcript']).label('transcript')).subquery()
    temp = transcript_subquery.c.transcript.op('->>')('w').cast(String)
    temp1 = transcript_subquery.c.transcript.op('->>')('s').cast(Integer)
    query = s.query(temp1, Audio.id).filter(temp.ilike("all"))

我遇到的问题是,我现在只有5个数据条目,其中一个数据条目的成绩单字段为空白。但是,我得到的结果包含以下内容:

(665610, 5), (736413, 5), (907230, 5), (942340, 5), (1020852, 5), (1023942, 5), (1037101, 5), (1078521, 5), (1105581, 5), (1117551, 5), (1372730, 5), (1501960, 5), (1508410, 5)

如果给出Audio.id = 5的条目甚至没有transcript json对象,那么它产生这样的结果真的很奇怪。而且,所有条目,每个都有112个结果。我怀疑112是该单词出现的总数,它有点重复计算每个音频ID的所有内容。而且我真的不知道如何修复我的sqlalchemy查询。

0 个答案:

没有答案