我有2个for循环,主要用于大数据。我想优化它并尽可能地提高速度。
source = [['row1', 'row2', 'row3'],['Product', 'Cost', 'Quantity'],['Test17', '3216', '17'], ['Test18' , '3217' , '18' ], ['Test19', '3218', '19' ], ['Test20', '3219', '20']]
创建生成器对象
it = iter(source)
variables = ['row2', 'row3']
variables_indices = [1, 2]
getkey = rowgetter(*key_indices)
for row in it:
k = getkey(row)
for v, i in zip(variables, variables_indices):
try:
o = list(k) # populate with key values initially
o.append(v) # add variable
o.append(row[i]) # add value
yield tuple(o)
except IndexError:
pass
def rowgetter(*indices):
if len(indices) == 0:
#print("STEP 7")
return lambda row: tuple()
elif len(indices) == 1:
#print("STEP 7")
# if only one index, we cannot use itemgetter, because we want a
# singleton sequence to be returned, but itemgetter with a single
# argument returns the value itself, so let's define a function
index = indices[0]
return lambda row: (row[index],)
else:
return operator.itemgetter(*indices)
这会返回一个元组,但是对于100,000行(源有5行),它会花费很多时间在平均100秒上。任何人都可以帮助减少这个时间。
注意:我也尝试过内联循环和列表理解,但每次迭代都没有返回
答案 0 :(得分:2)
下面标出了一些改进,但它们并没有改变算法的复杂性:
zipped = list(zip(variables, variables_indices)) # create once and reuse
for row in it:
for v in zipped:
try:
yield (*getkey(row), v, row[i]) # avoid building list and tuple conversion
except IndexError:
pass
答案 1 :(得分:1)
从list
中创建k
,然后附加2个项目,然后转换为tuple
会创建大量副本。
我建议一个带生成器的辅助函数从k
列表中产生,然后产生剩余的元素。将其包裹在tuple
中以创建可立即使用的功能:
k = [1,2,3,4]
def make_tuple(k,a,b):
def gen(k,a,b):
yield from k
yield a
yield b
return tuple(gen(k,a,b))
result = make_tuple(k,12,14)
输出:
(1, 2, 3, 4, 12, 14)