我很难弄清楚如何开发此算法的第3阶段:
将结构化数据推送到数据库并同时继续1(无需等待启动1即可完成数据库上传,这两项内容应并行进行)
import requests
import time
from sqlalchemy import schema, types
from sqlalchemy.engine import create_engine
import threading
# I usually work on postgres
meta = schema.MetaData(schema="example")
# table one
table_api_one = schema.Table('api_one', meta,
schema.Column('id', types.Integer, primary_key=True),
schema.Column('field_one', types.Unicode(255), default=u''),
schema.Column('field_two', types.BigInteger()),
)
# table two
table_api_two = schema.Table('api_two', meta,
schema.Column('id', types.Integer, primary_key=True),
schema.Column('field_one', types.Unicode(255), default=u''),
schema.Column('field_two', types.BigInteger()),
)
# create tables
engine = create_engine("postgres://......", echo=False, pool_size=15, max_overflow=15)
meta.bind = engine
meta.create_all(checkfirst=True)
# get the data from the API and return data as JSON
def getdatafrom(url):
data = requests.get(url)
structured = data.json()
return structured
# push the data to the DB
def flush(list_one,list_two):
connection = engine.connect()
# both lists are list of json
connection.execute(table_api_one.insert(),list_one)
connection.execute(table_api_two.insert(),list_two)
connection.close()
# start doing something
def main():
timesleep = 30
flush_limit = 10
threading.Timer(timesleep * flush_limit, main).start()
data_api_one = []
data_api_two = []
# repeat the process 10 times (flush_limit) avoiding to keep to busy the DB
WHILE len(data_api_one) > flush_limit AND len(data_api_two) > flush_limit:
data_api_one.append(getdatafrom("http://www.apiurlone.com/api...").copy())
data_api_two.append(getdatafrom("http://www.apiurltwo.com/api...").copy())
time.sleep(timesleep)
# push the data when the limit is reached
flush(data_api_one,data_api_two)
# start the example
main()
在这个示例脚本中,线程每10 * 30秒启动一次main()(避免重叠线程) 但是,对于flush()期间的此算法,脚本停止从API收集数据。
如何刷新并不断从API获取数据?
谢谢!
答案 0 :(得分:0)
常用方法是Queue
对象(来自名为Queue
或queue
的模块,具体取决于Python版本。)
创建一个生成器函数(在一个线程中运行),它收集api数据,并在队列中清空put
时,在另一个线程中运行的使用者函数等待get
队列中的数据并将其存储到数据库中。