循环遍历生成器函数,该函数遍历两个非常大的列表列表

时间:2017-02-21 22:01:01

标签: python list generator

我有两个非常大的列表,这些列表的大小是动态的,不知道它们来自不同的来源,每个子列表的长度都是 2000

我需要遍历列表列表的每个子列表并将其传递给sql查询,进行一些数据处理然后转到下一个子列表。

使用生成器是迭代这些庞大的列表列表的理想选择。

为了简化,我使用2个列表列表来重新创建问题,这些列表有10个条目长,每个子列表有2个条目。

    def test():
         Send_list= [['2000000000259140093', '1000000000057967562'],
                     ['4000000000008393617', '3000000000006545639'],
                     ['1000000000080880314','1000000000119225203'],
                     ['1000000000096861508', '1000000000254915223'],
                     ['2000000000079125911', '1000000000014797506']]
         Pay_list = [['3000000000020597219', '1000000000079442325'],
                     ['1000000000057621671', '3000000000020542928'],
                     ['3000000000020531804', '4000000000010435913'],
                     ['1000000000330634222', '3000000000002353220'],
                     ['1000000000256385361', '2000000000286618770']]
         for list1,list2 in itertools.izip_longest(Send_list,Pay_list):
               yield [list1,list2]

现在,我可以使用 next()函数逐个迭代并将子列表传递给sql查询。

    In [124]: c = next(test())

In [125]: c
Out[125]:
[['2000000000259140093', '1000000000057967562'],
 ['3000000000020597219', '1000000000079442325']]

a = c[0]
b = c[1]
placeholders1 = ','.join('?' for i in range(len(a)))
placeholders2 = ','.join('?' for i in range(len(b)))
sql1 = "select * from Pretty_Txns where Send_Customer in (%s)"% placeholders1
sql2 = "select * from Pretty_Txns where pay_Customer in (%s)"% placeholders2
df_send = pd.read_sql(sql1,cnx,params=a)
df_pay = pd.read_sql(sql2,cnx,params=b)
///data processing and passing the result frame back to sql///
result.to_sql()
///then repeating the same steps for the the next sublists

现在,当我尝试使用for循环遍历 next()

   for list in test():
        c = next(test())
        a = c[0]
        b = c[1]
        placeholders1 = ','.join('?' for i in range(len(a)))
        placeholders2 = ','.join('?' for i in range(len(b)))
        sql1 = "select * from Pretty_Txns where Send_Customer in (%s)"% placeholders1
        sql2 = "select * from Pretty_Txns where pay_Customer in (%s)"% placeholders2
        df_send = pd.read_sql(sql1,cnx,params=a)
        df_pay = pd.read_sql(sql2,cnx,params=b)
       ////lot of data processing steps and passing the final results back to sql
        result.to_sql()

它只迭代前两个子列表并对其进行处理并停止。

现在c的值是:

In [145]: c
Out[145]:
[['2000000000259140093', '1000000000057967562'],
 ['3000000000020597219', '1000000000079442325']]

这是 Send_list Pay_list

中的第一个子列表
    In [149]: Send_list
Out[149]:
[['2000000000259140093', '1000000000057967562'],
 ['4000000000008393617', '3000000000006545639'],
 ['1000000000080880314', '1000000000119225203'],
 ['1000000000096861508', '1000000000254915223'],
 ['2000000000079125911', '1000000000014797506']]

In [150]: Pay_list
Out[150]:
[['3000000000020597219', '1000000000079442325'],
 ['1000000000057621671', '3000000000020542928'],
 ['3000000000020531804', '4000000000010435913'],
 ['1000000000330634222', '3000000000002353220'],
 ['1000000000256385361', '2000000000286618770']]

结果数据框中的数据传递给sql后,控件应返回 c = next(test())步骤,整个过程应该重复,直到原始列表用尽。

我正在努力实现这一目标。期待一些指导和指导。

2 个答案:

答案 0 :(得分:2)

首先,我不明白为什么您要将for循环与next的显式调用混合在一起。

其次,next(test())next循环的每次迭代时在新生成器对象上调用for,这意味着c将永远是来自gen的第一个项目。宾语。您可能需要存储相同的gen。对象某处,然后反复调用next

gen = test()
c = next(gen)
...
c = next(gen)

最后,itertools.izip_longest返回一个迭代器,因此你可能会通过从中产生值来使事情复杂化。你可以简单地返回迭代器。

def test():
     ...
     return itertools.izip_longest(Send_list, Pay_list):

答案 1 :(得分:1)

好吧,不要一直创建新的生成器,只使用它的第一个元素。创建一个生成器并迭代它。

>>> for a, b in test():
        print a, b

['2000000000259140093', '1000000000057967562'] ['3000000000020597219', '1000000000079442325']
['4000000000008393617', '3000000000006545639'] ['1000000000057621671', '3000000000020542928']
['1000000000080880314', '1000000000119225203'] ['3000000000020531804', '4000000000010435913']
['1000000000096861508', '1000000000254915223'] ['1000000000330634222', '3000000000002353220']
['2000000000079125911', '1000000000014797506'] ['1000000000256385361', '2000000000286618770']