我正在构建一个脚本,该脚本从json / REST流接收数据,然后将其添加到数据库中。我想建立一个缓冲区,该缓冲区从流中收集数据并将其存储,直到将其成功插入数据库为止。
这个想法是,一个线程将把数据从api流传输到数据帧,而另一个线程将尝试将数据提交到数据库,一旦将它们成功插入数据库,就将其从数据帧中删除。
我编写了以下代码来测试该概念-唯一的问题是,它不起作用!
import threading
from threading import Thread
import pandas as pd
import numpy as np
import time
from itertools import count
# set delay
d=5
# add items to dataframe every few seconds
def appender():
testdf = pd.DataFrame([])
print('starting streamsim')
for i in count():
testdf1 = pd.DataFrame(np.random.randint(0,100,size=(np.random.randint(0,25), 4)), columns=list('ABCD'))
testdf = testdf.append(testdf1)
print('appended')
print('len is now {0}'.format(len(testdf)))
return testdf
time.sleep(np.random.randint(0,5))
# combine the dfs, and operate on them
def dumper():
print('starting dumpsim')
while True:
# check if there are values in the df
if len(testdf.index) > 0:
print('searching for values')
for index, row in testdf.iterrows():
if row['A'] < 10:
testdf.drop(index, inplace=True)
print('val dropped')
else:
print('no vals found')
# try to add rows to csv to simulate sql insert command
for index, row in testdf.iterrows():
# try to append to csv
try:
with open('footest.csv', 'a') as f:
row.to_csv(f, mode= 'a', header=True)
except:
print('append operation failed, skipping')
pass
#if operation succeeds, drop the row
else:
testdf.drop(index)
print('row dropped after success')
if len(testdf.index) == 0:
print('df is empty')
pass
time.sleep(d)
if __name__ == '__main__':
Thread(target = appender).start()
Thread(target = dumper).start()
有没有办法使这项工作有效?还是当一个线程正在工作时DataFrame是“锁定”的?