我在multiprocessing
django Queryset
运行django command
个任务
这是我的代码:
df = pd.read_csv("my_stock_data.csv")
def run(p):
v = p.volume
code = p.symbol.code
date = p.date
global df
num_cols = 5
index = df.columns.get_loc('A' + code)
_df = df[df.columns[index:index+num_cols]].dropna(how='all')
_df.columns = ['open', 'high', 'low', 'close', 'volume']
try:
_p = _df.loc[date]
except Exception as e:
print(p.symbol, e)
return
if v != _p['volume']:
print(p.symbol, v, _p['volume'])
return
class Command(BaseCommand):
def handle(self, *args, **options):
qs = DailyPrice.objects.all()
pool = Pool(12)
pool.map(run, qs)
pool.close()
pool.join()
正如您在此处所见pool
使用queryset
映射功能。
问题是它工作正常并且运行完美,除非它到达queryset
的末尾,它停止了,这意味着程序没有完成。
我认为问题是由于我通过了queryset
,而不是list
或其他事情。
所以我尝试如下:
df = pd.read_csv(
os.path.join(settings.PROJECT_ROOT_DIR, "data_0525", "stock_all.csv"),
index_col=0,
parse_dates=True,
)
def run(p):
v = p[0]
code = p[1]
date = p[2]
global df
num_cols = 5
index = df.columns.get_loc('A' + code)
_df = df[df.columns[index:index+num_cols]].dropna(how='all')
_df.columns = ['open', 'high', 'low', 'close', 'volume']
try:
_p = _df.loc[date]
except Exception as e:
print(code, e)
return
if v != _p['volume']:
print(code, v, _p['volume'])
return
class Command(BaseCommand):
def handle(self, *args, **options):
qs = [
(p.volume, p.symbol.code, p.date) for p in
DailyPrice.objects.select_related('symbol').filter(condition).order_by('symbol__name')
]
pool = Pool(12)
pool.map(run, qs)
pool.close()
pool.join()
现在,我通过了tuple
而不是queryset
,它运作良好,完成了整个过程。
为什么会这样?我不应该被允许通过queryset
吗?但是,即使我通过了queryset
!