Question

我有一个大清单和一个熊猫数据框。对于数据框中的每个元素，我在列表中搜索某些值并返回另一个列表。我的数据帧非常大（> 50,000行），列表中有近50万个项目。我在下面有一个简化版本。运行需要非常长的时间（> 5小时）。我想知道如何让它变得更加pythonic和效率。我真的很感激任何建议。我使用的是python 2.7。

import pandas as pd
import numpy as np
class ODPath(object):    
    def __init__(self,path=[],vol=0):
        self.path = path
        self.vol = vol
    def setpath(self,newpath):
        self.path = newpath
    def setvol(self,newvol):
        self.vol = newvol

def WritePathFile(allpaths_t): 
    for paths_t in allpaths_t:     
        pathvol = paths_t.vol
        path = paths_t.path
        print "Volume is " + str(pathvol)
        for i in range (0,len(path)):
            print """->""" + str(path[i])

df1 = pd.DataFrame(np.random.randn(10, 3),columns=['origin','destination','promise'])
mylist=[[1,2,3,[5,6]],[2,3,1,[1,2,4]],[5,6,1,[4,5,2]],[10,5,1,[1,2,3,4,5]]]
allpaths = []
for index, row in df1.iterrows():
    origin = row['origin']
    dest = row['destination']
    promise = row['promise']
    newpathlist=[x for x in mylist if origin<=x[0] if dest<=x[1] if x[2]<=(promise)]

    if not newpathlist: #list is empty
        path=ODPath([],0)
        newpath = [origin] + [dest]
        path.setpath(newpath)   
        path.setvol(dest) 
        allpaths.append(path)
        #do some other assignments
    else:
        for i in newpathlist:            
            path=ODPath([],0)
            newpath=x[3]  #this line is edited.
            path.setpath(newpath)
            path.setvol(promise)                
            allpaths.append(path)

WritePathFile(allpaths)

Answer 1

点击您要搜索的列表并创建pd.Series。然后使用Pandas函数而不是Python循环重写搜索逻辑。

对于结果，一个想法是返回一个MxN布尔矩阵，其中行对应于数据帧的行，列表示列表中的每个项目（如果匹配）。如果每行的匹配元素数量相当大，那将会有所帮助。如果它的数量很小且数量可变，那么返回列表仍然可以。

有效地搜索列表并使用python创建另一个列表

1 个答案: