Python并行问题

时间:2015-05-05 23:12:52

标签: python parallel-processing

我的程序设置为根据状态和其他变量下载URL的笛卡尔积,将zip文件(从创建的URL)保存到指定位置,检查zip文件中的数据(某些zip文件下载没有数据),写入关于状态数据的特定文件,然后在状态完成时写入文件。这是基于状态并行完成的,即Alabama和Alaska将并行执行上述操作。但是,我一直收到以下错误:

An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (179, 0))

当我开始新鲜时发生错误,即之前没有运行过程。如果我部分运行该过程然后重新开始这不会发生。更具体地说,它随机发生。

这是我的代码:

功能 -

def createURL(state, typ, geography, level, data, dictionary):

    DATALIST    = list(itertools.product(typ, geography, level, data))
    TXTLIST     = list(itertools.product(typ, dictionary))
    DEFLIST     = list(itertools.product(typ))

    DATALINKS = []
    for data in DATALIST:
        result  = 'URL'

    DATALINKS.append(result)

     TXTLINKS = []
     for txt in TXTLIST:
          links = 'URL'
    TXTLINKS.append(links)


     DEFLINKS = []
     for defl in DEFLIST:
         definitions = 'URL'

    DEFLINKS.append(definitions)

      URLLINKS = DATALINKS + TXTLINKS + DEFLINKS
      return URLLINKS


def downloadData(state, TYPE, GEOGRAPHY, LEVEL, DATA, \
             DICTIONARY, YEAR, QUARTER, completedStates):
     print ('Working on state: ', state)    

     URLLINKS = createURL(state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY)

    DIRECTORY   = '/home/justin/QWI/' + YEAR + 'Q' + QUARTER + '/' + state
    if not os.path.exists(DIRECTORY[:-2]):
         os.makedirs(DIRECTORY[:-2])

    if not os.path.exists(DIRECTORY):
         os.makedirs(DIRECTORY)

    downLoadedURLs = DIRECTORY[:-2] + 'downLoadedURLs.txt'
    if not os.path.isfile(downLoadedURLs):
         with open(downLoadedURLs, 'a') as downloaded:
             downloaded.write('')


     with open(downLoadedURLs) as downloaded:
         URLcontent = downloaded.read().splitlines()


     URLLINKS = [x for x in URLLINKS if x not in URLcontent]

    for url in URLLINKS:
         print ('Downloading data: ', url)
         save = DIRECTORY + '/' + os.path.basename(url)

        urllib.urlretrieve(url, save)
        with open(downLoadedURLs, 'a') as downloaded:
             downloaded.write('{}\n'.format(url))

        if os.stat(save).st_size == 0:
            shutil.rmtree(DIRECTORY)
             with open(DIRECTORY[:-2] + '/zeroDataStates.txt', 'a') as zeroData:
            zeroData.write('{}\n'.format(state))
        break

    with open(completedStates, 'a') as completedState:
        completedState.write('{}\n'.format(state))  

以下是并行代码:

from joblib import Parallel, delayed

STATE = ['al', 'ak', etc...]

Parallel(n_jobs = CORES)(delayed(downloadData)\
    (state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY, YEAR, QUARTER, 
    completedStates) for state in STATE)

我认为在写入文件或获取网址时会发生错误。

0 个答案:

没有答案