对象类型' _Task'没有len()错误

时间:2016-03-04 21:05:27

标签: python parallel-processing

我正在使用python的并行编程模块我有一个函数返回一个数组但是当我打印包含函数parallelized的值的变量时,返回我" pp._Task对象在0x04696510"而不是矩阵的价值。 这是代码:

from __future__ import print_function
import scipy, pylab
from scipy.io.wavfile import read
import sys
import peakpicker as pea
import pp
import fingerprint as fhash
import matplotlib
import numpy as np
import tdft
import subprocess
import time
if __name__ == '__main__':
        start=time.time()
        #Peak picking dimensions 
        f_dim1 = 30
        t_dim1 = 80 
        f_dim2 = 10
        t_dim2 = 20
        percentile = 80
        base = 100 # lowest frequency bin used (peaks below are too common/not as useful for identification)
        high_peak_threshold = 75
        low_peak_threshold = 60
        #TDFT parameters
        windowsize = 0.008     #set the window size  (0.008s = 64 samples)
        windowshift = 0.004    #set the window shift (0.004s = 32 samples)
        fftsize = 1024         #set the fft size (if srate = 8000, 1024 --> 513 freq. bins separated by 7.797 Hz from 0 to 4000Hz) 

        #Hash parameters
        delay_time = 250     # 250*0.004 = 1 second#200
        delta_time = 250*3    # 750*0.004 = 3 seconds#300
        delta_freq = 128      # 128*7.797Hz = approx 1000Hz#80
        #Time pair parameters
        TPdelta_freq = 4
        TPdelta_time = 2


        #Cargando datos almacenados
        database=np.loadtxt('database.dat')
        songnames=np.loadtxt('songnames.dat', dtype=str, delimiter='\t')
        separator = '.'
        print('Please enter an audio sample file to identify: ')
        userinput = raw_input('---> ')
        subprocess.call(['ffmpeg','-y','-i',userinput, '-ac', '1','-ar', '8k', 'filesample.wav'])   
        sample = read('filesample.wav')
        userinput = userinput.split(separator,1)[0]
        print('Analyzing the audio sample: '+str(userinput))
        srate = sample[0]  #sample rate in samples/second
        audio = sample[1]  #audio data      
        spectrogram = tdft.tdft(audio, srate, windowsize, windowshift, fftsize)
        mytime = spectrogram.shape[0]
        freq = spectrogram.shape[1]

        print('The size of the spectrogram is time: '+str(mytime)+' and freq: '+str(freq))

        threshold = pea.find_thres(spectrogram, percentile, base)

        peaks = pea.peak_pick(spectrogram,f_dim1,t_dim1,f_dim2,t_dim2,threshold,base)

        print('The initial number of peaks is:'+str(len(peaks)))
        peaks = pea.reduce_peaks(peaks, fftsize, high_peak_threshold, low_peak_threshold)
        print('The reduced number of peaks is:'+str(len(peaks)))

        #Store information for the spectrogram graph
        samplePeaks = peaks
        sampleSpectro = spectrogram

        hashSample = fhash.hashSamplePeaks(peaks,delay_time,delta_time,delta_freq)
        print('The dimensions of the hash matrix of the sample: '+str(hashSample.shape))


        # tuple of all parallel python servers to connect with
        ppservers = ()
        #ppservers = ("10.0.0.1",)

        if len(sys.argv) > 1:
            ncpus = int(sys.argv[1])
            # Creates jobserver with ncpus workers
            job_server = pp.Server(ncpus, ppservers=ppservers)
        else:
            # Creates jobserver with automatically detected number of workers
            job_server = pp.Server(ppservers=ppservers)

        print ("Starting pp with", job_server.get_ncpus(), "workers")

        print('Attempting to identify the sample audio clip.')

这里我用指纹调用函数,注释行有效,但是当我尝试并行化时,不能工作:

        timepairs = job_server.submit(fhash.findTimePairs, (database, hashSample, TPdelta_freq, TPdelta_time, ))
#        timepairs = fhash.findTimePairs(database, hashSample, TPdelta_freq, TPdelta_time)
        print (timepairs)


        #Compute number of matches by song id to determine a match
        numSongs = len(songnames)
        songbins= np.zeros(numSongs)
        numOffsets = len(timepairs)
        offsets = np.zeros(numOffsets)
        index = 0
        for i in timepairs:
                offsets[index]=i[0]-i[1]
                index = index+1
                songbins[i[2]] += 1

        # Identify the song
        #orderarray=np.column_stack((songbins,songnames))
        #orderarray=orderarray[np.lexsort((songnames,songbins))]
        q3=np.percentile(songbins, 75)
        q1=np.percentile(songbins, 25)
        j=0
        for i in songbins:
                if i>(q3+(3*(q3-q1))):
                        print("Result-> "+str(i)+":"+songnames[j])
                j+=1
        end=time.time()
        print('Tiempo: '+str(end-start)+' s')
        print("Time elapsed: ", +time.time() - start, "s")
        fig3 = pylab.figure(1003)
        ax = fig3.add_subplot(111)
        ind = np.arange(numSongs)
        width = 0.35
        rects1 = ax.bar(ind,songbins,width,color='blue',align='center')
        ax.set_ylabel('Number of Matches')
        ax.set_xticks(ind)
        xtickNames = ax.set_xticklabels(songnames)
        matplotlib.pyplot.setp(xtickNames)
        pylab.title('Song Identification') 
        fig3.show()

        pylab.show()

        print('The sample song is: '+str(songnames[np.argmax(songbins)]))

我尝试并行化的指纹功能是:

def findTimePairs(hash_database,sample_hash,deltaTime,deltaFreq):
"Find the matching pairs between sample audio file and the songs in the database"

timePairs = []

for i in sample_hash:
    for j in hash_database:
        if(i[0] > (j[0]-deltaFreq) and i[0] < (j[0] + deltaFreq)):
            if(i[1] > (j[1]-deltaFreq) and i[1] < (j[1] + deltaFreq)):
                if(i[2] > (j[2]-deltaTime) and i[2] < (j[2] + deltaTime)):
                    timePairs.append((j[3],i[3],j[4]))
                else:
                    continue
            else:
                continue
        else:
            continue

return timePairs

完整的错误是:

Traceback (most recent call last):
File "analisisPrueba.py", line 93, in <module>
numOffsets = len(timepairs)
TypeError: object of type '_Task' has no len()

1 个答案:

答案 0 :(得分:1)

submit()方法向服务器提交任务。你得到的是对任务的参考,而不是它的结果。 (它如何返回结果?submit()在完成任何工作之前返回!)您应该提供一个回调函数来接收结果。例如,timepairs.append是一个函数,它将获取结果并将其附加到列表timepairs

timepairs = []
job_server.submit(fhash.findTimePairs, (database, hashSample, TPdelta_freq, TPdelta_time, ), callback=timepairs.append)

(每个findTimePairs调用应计算一个结果,如果不明显,则应提交多个任务。否则,您正在调用并行Python的所有机制都是无缘无故的。确保在尝试对结果做任何事情之前调用job_server.wait()等待所有任务完成。简而言之,阅读文档和一些示例脚本并确保你了解它是如何运作的。)