在SQL Alchemy中加入多个表后如何分组依据

时间:2019-07-13 23:55:44

标签: python sqlite flask flask-sqlalchemy

我是Flask / SQL Alchemy的新手,我正试图为MTurk调查获得答案摘要,如下所示:

Filename    Answered_A    Answered_B    Answered_C    Answered_D    Answered_E
file1.mp3   10            8             5             0             1
file2.mp3   1             26            2             3             7
file3.mp3   4             0             0             3             57
file4.mp3   1             6             1             5             28

使用以下模型(为简洁起见,省略了不相关的字段):

class Survey(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    tasks = db.relationship('Task', backref='survey', lazy='dynamic')

class Task(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    survey_id = db.Column(db.Integer, db.ForeignKey('survey.id'))
    assignments = db.relationship('Assignment', backref='task', lazy='dynamic')

class Assignment(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    task_id = db.Column(db.Integer, db.ForeignKey('task.id'))
    responses = db.relationship('Response', backref='assignment', lazy='dynamic')

class Response(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    assignment_id = db.Column(db.Integer, db.ForeignKey('assignment.id'))
    response_item = db.Column(db.String(255))
    response_value = db.Column(db.String(255))

其中response_item为文件名,response_value为1-5,由Answered_A,Answered_B ...等表示。上面表示的模型都是级联的1-m关系。

我遵循了在这里尝试过的方法:(Join multiple tables in SQLAlchemy/Flask)就像这样:

q = (db.session.query(Survey, Task, Assignment, Response)
    .join(Task, Survey.id==Task.survey_id)
    .join(Assignment, Task.id==Assignment.task_id)
    .join(Response, Assignment.id==Response.assignment_id)).all()

它会生成一个元组列表,例如问题(调查,任务,分配,结果)。

我要完成的是一个查询,例如,对Survey.id=4使用正确的group by进行查询,并获得上面列出的结构。 如前所述,答案的范围从Answered_A到Answered_E,如果这样更容易,则范围从1-5。

1 个答案:

答案 0 :(得分:1)

我为您制作了一个github,展示了如何执行此操作:

https://github.com/researcher2/stackoverflow_57023616

由于我无权访问您的数据,因此我做了一个模型,可以在create_db.py中找到。

我为每个文件名及其可能的选项(从0开始)计数。然后遍历从数据库返回的响应,然后增加计数。

我可能明天再来讨论SQL。

server.py

from app import app, db
from flask import render_template
from models import Survey, Task, Assignment, Response

@app.route('/')
def index():
    (headers, fields, data) = getSummary()
    return render_template("survey_summary.html", headers=headers, fields=fields, data=data)

def getSummary():
    fields = ["Filename", "A", "B", "C", "D", "E"] # column names for output
    headers = dict() # custom header names for given fieldname (no difference here)
    for field in fields:
        headers[field] = field

    # build data structures
    data = []
    rowMap = dict()    
    fileNames = ["file1.mp3", "file2.mp3", "file3.mp3", "file4.mp3"]    

    for fileName in fileNames:
        row = dict()
        row["Filename"] = fileName
        row["A"] = 0
        row["B"] = 0
        row["C"] = 0
        row["D"] = 0
        row["E"] = 0
        data.append(row)
        rowMap[fileName] = row

    # query
    query = db.session.query(Survey, Task, Assignment, Response) \
                      .join(Task, Survey.id==Task.survey_id) \
                      .join(Assignment, Task.id==Assignment.task_id) \
                      .join(Response, Assignment.id==Response.assignment_id) \
                      .filter(Survey.id == 1)

    results = query.all()

    # summarise counts
    for (_, _, _, response) in results:
        rowMap[response.response_item][response.response_value] = rowMap[response.response_item][response.response_value] + 1

    return (headers, fields, data)

templates / survey_summary.html

如今,对于大多数表输出,我都使用类似于此模板的东西,只是首先建立标题,字段和数据集合。需要研究熊猫,想象有人做了类似的事情。

<html>
<head>
    <title>mturk survey summary</title>
</head>
<body>
    <table>
        <tr>
            {% for field in fields %}
            <th>{{headers[field]}}</th>
            {% endfor %}
        </tr>
        {% for row in data %}
        <tr>
            {% for field in fields %}
            <td>
                {{ row[field] | safe }}
            </td>
            {% endfor %}
        </tr>
        {% endfor %}
    </table>
</body>
</html>

好的,我回来做SQL了,如果需要,可以将其替换:

# select response_item, response_value, count(response_value) 
# from response
# group by response_item, response_value
query = db.session.query(Response.response_item, Response.response_value, func.count(Response.response_value)) \
                  .join(Assignment, Response.assignment_id == Assignment.id) \
                  .join(Task, Assignment.task_id==Task.id) \
                  .join(Survey, Survey.id==Task.survey_id) \
                  .filter(Survey.id == 1) \
                  .group_by(Response.response_item, Response.response_value)

print(query)
results = query.all()

for (item, value, count) in results:
    rowMap[item][value] = count