我正在使用下面的命令从sqlite数据库中检索一定数量的数据,并按预期获得一个大的结果列表,同时还导出到HTML和文本文档。我想基于' messages.conversation_id'来拆分文档中显示的表格。专栏尚无法想出办法。我尝试过使用groupby函数,但它只是对结果列表进行排序。
谢谢。
connect = sqlite3.connect(sqlitedb)
df = pd.read_sql_query("""SELECT messages._id, messages.date, messages.body, messages.conversation_id, participants_info.number, participants_info.display_name, participants_info._id
FROM messages
INNER JOIN participants_info
ON messages.participant_id = participants_info._id;""", connect)
df.to_html(open('messages.html', 'w'))
base_filename = 'test.txt'
with open(os.path.join(base_filename),'w') as outfile:
df.to_string(outfile)
print (df)
我已经显示了下面给出的结果的屏幕截图,我希望能够根据conversation_id列将表拆分为较小的表。所以我为每个ID都有一个不同的表。
答案 0 :(得分:0)
告诉数据库按conversation_id
排序。然后逐行处理数据,并在值改变时启动一个新表,即与最后一个不同。
如果您无法逐行处理数据,则每个表需要一个查询。这要求您首先获得所有会话ID的列表(SELECT DISTINCT conversation_id FROM whatever
),然后对每个值(SELECT ... WHERE conversation_id = ?
)执行实际查询。
答案 1 :(得分:0)
考虑循环关闭不同的 conversation_ids 的游标列表,如@CL建议的那样,迭代地将数据帧转储到增长的.html和.txt文件中,由换行符分隔。甚至在SQL中使用参数化查询和表别名以获得最佳实践。
import sqlite3
import pandas as pd
conn = sqlite3.connect('/path/to/sqlite/database.db')
cur = conn.cursor()
cur = cur.execute("SELECT DISTINCT m.conversation_id" + \
" FROM messages m " + \
" INNER JOIN participants_info p" + \
" ON m.participant_id = p._id" + \
" WHERE m.conversation_id IS NOT NULL")
query = "SELECT m._id, m.date, m.body, m.conversation_id," + \
" p.number, p.display_name, p._id" + \
" FROM messages m" + \
" INNER JOIN participants_info p" + \
" ON m.participant_id = p._id" + \
" WHERE m.conversation_id = ?"
with open('messages.html', 'w') as h, open('test.txt', 'w') as t:
for convo in cur.fetchall():
df = pd.read_sql_query(query, conn, params=convo)
# HTML WRITE
h.write(df.to_html())
h.write('<br/>')
# TXT WRITE
t.write(df.to_string())
t.write('\n\n')
cur.close()
conn.close()