我是Flask / SQL Alchemy的新手,我正试图为MTurk调查获得答案摘要,如下所示:
Filename Answered_A Answered_B Answered_C Answered_D Answered_E
file1.mp3 10 8 5 0 1
file2.mp3 1 26 2 3 7
file3.mp3 4 0 0 3 57
file4.mp3 1 6 1 5 28
使用以下模型(为简洁起见,省略了不相关的字段):
class Survey(db.Model):
id = db.Column(db.Integer, primary_key=True)
tasks = db.relationship('Task', backref='survey', lazy='dynamic')
class Task(db.Model):
id = db.Column(db.Integer, primary_key=True)
survey_id = db.Column(db.Integer, db.ForeignKey('survey.id'))
assignments = db.relationship('Assignment', backref='task', lazy='dynamic')
class Assignment(db.Model):
id = db.Column(db.Integer, primary_key=True)
task_id = db.Column(db.Integer, db.ForeignKey('task.id'))
responses = db.relationship('Response', backref='assignment', lazy='dynamic')
class Response(db.Model):
id = db.Column(db.Integer, primary_key=True)
assignment_id = db.Column(db.Integer, db.ForeignKey('assignment.id'))
response_item = db.Column(db.String(255))
response_value = db.Column(db.String(255))
其中response_item为文件名,response_value为1-5,由Answered_A,Answered_B ...等表示。上面表示的模型都是级联的1-m关系。
我遵循了在这里尝试过的方法:(Join multiple tables in SQLAlchemy/Flask)就像这样:
q = (db.session.query(Survey, Task, Assignment, Response)
.join(Task, Survey.id==Task.survey_id)
.join(Assignment, Task.id==Assignment.task_id)
.join(Response, Assignment.id==Response.assignment_id)).all()
它会生成一个元组列表,例如问题(调查,任务,分配,结果)。
我要完成的是一个查询,例如,对Survey.id=4
使用正确的group by进行查询,并获得上面列出的结构。
如前所述,答案的范围从Answered_A到Answered_E,如果这样更容易,则范围从1-5。
答案 0 :(得分:1)
我为您制作了一个github,展示了如何执行此操作:
https://github.com/researcher2/stackoverflow_57023616
由于我无权访问您的数据,因此我做了一个模型,可以在create_db.py中找到。
我为每个文件名及其可能的选项(从0开始)计数。然后遍历从数据库返回的响应,然后增加计数。
我可能明天再来讨论SQL。
server.py
from app import app, db
from flask import render_template
from models import Survey, Task, Assignment, Response
@app.route('/')
def index():
(headers, fields, data) = getSummary()
return render_template("survey_summary.html", headers=headers, fields=fields, data=data)
def getSummary():
fields = ["Filename", "A", "B", "C", "D", "E"] # column names for output
headers = dict() # custom header names for given fieldname (no difference here)
for field in fields:
headers[field] = field
# build data structures
data = []
rowMap = dict()
fileNames = ["file1.mp3", "file2.mp3", "file3.mp3", "file4.mp3"]
for fileName in fileNames:
row = dict()
row["Filename"] = fileName
row["A"] = 0
row["B"] = 0
row["C"] = 0
row["D"] = 0
row["E"] = 0
data.append(row)
rowMap[fileName] = row
# query
query = db.session.query(Survey, Task, Assignment, Response) \
.join(Task, Survey.id==Task.survey_id) \
.join(Assignment, Task.id==Assignment.task_id) \
.join(Response, Assignment.id==Response.assignment_id) \
.filter(Survey.id == 1)
results = query.all()
# summarise counts
for (_, _, _, response) in results:
rowMap[response.response_item][response.response_value] = rowMap[response.response_item][response.response_value] + 1
return (headers, fields, data)
templates / survey_summary.html
如今,对于大多数表输出,我都使用类似于此模板的东西,只是首先建立标题,字段和数据集合。需要研究熊猫,想象有人做了类似的事情。
<html>
<head>
<title>mturk survey summary</title>
</head>
<body>
<table>
<tr>
{% for field in fields %}
<th>{{headers[field]}}</th>
{% endfor %}
</tr>
{% for row in data %}
<tr>
{% for field in fields %}
<td>
{{ row[field] | safe }}
</td>
{% endfor %}
</tr>
{% endfor %}
</table>
</body>
</html>
好的,我回来做SQL了,如果需要,可以将其替换:
# select response_item, response_value, count(response_value)
# from response
# group by response_item, response_value
query = db.session.query(Response.response_item, Response.response_value, func.count(Response.response_value)) \
.join(Assignment, Response.assignment_id == Assignment.id) \
.join(Task, Assignment.task_id==Task.id) \
.join(Survey, Survey.id==Task.survey_id) \
.filter(Survey.id == 1) \
.group_by(Response.response_item, Response.response_value)
print(query)
results = query.all()
for (item, value, count) in results:
rowMap[item][value] = count