使用CSV保存输出的多处理

时间:2014-04-24 19:38:18

标签: python multiprocessing

以下是我要完成的工作:我从csv文件中加载了一系列股票行情,我使用每个代码加载另一个名为stock_prices的csv文件,并以多个时间间隔计算股票回报(如1分钟,2分钟等等)。我希望每个处理器计算给定库存和给定时间间隔(如1分钟库存退货)的库存回报,然后保持这个时间序列的回报并等待所有处理器完成处理库存的工作返回所有代码,然后将所有股票回报合并到一个数据库中,然后将输出保存在csv文件中。然后我想在不同的时间间隔内重复这个过程。我不知道如何编写我的代码来保存我的输出在csv文件中。我将在集群计算机中处理我的代码 - 大约有50个处理器。这是我的代码尝试:

import multiprocessing
import csv
import pandas as pd
from datetime import datetime

#HERE I LOAD THE TICKER LIST. IT LOOKS LIKE ['AAPL', 'GOOG']
ticker = open('F:\ticker_list.csv',r)

 # THE INTERVAL FOR WHICH I WANT TO COMPUTE THE STOCK RETURNS
interval=['1','2','5']

def worker(TIC,INTER):
         #HERE I LOAD THE STOCK PRICES FOR A GIVEN TICKER
        df = pd.read_csv('C:\stock_prices_'+TIC+".csv", header=0)
...

# AFTER SOME COMPUTATION, df IS A TIME-SERIES OF STOCK RETURNS FOR A GIVEN STOCK AND FOR A TIME-INTERVAL

for inter in interval:
    if __name__ == '__main__':
        jobs = []
        for i in range(50): #50 FOR THE NUMBER OF PROCESSORS
            # HERE I AM NOT SURE - DO I LOOP THROUGH MY LIST OF TICKERS
            # AND SUBMIT ONE TICKER FOR EACH WORKER AND THE CHOSEN 
            # TIME-INTERVAL?  
            p = Worker(ticker[i],INTER)
            jobs.append(p)
            p.start()
        for j in jobs:
            j.join()

     # HERE I SAVE THE OUTPUT OF TIME-SERIES FOR ALL STOCKS FOR A
     # GIVEN TIME INTERVAL ... is it jobs? Here I wrote the code as
     # if jobs was in dataframe since df in worker is in dataframe format.
    jobs.to_csv("C:inter+"_stockreturns.csv", float_format='%.6f')

0 个答案:

没有答案