我有2个大型列表,每个列表大约有10万个元素,其中一个比另一个大,我想要迭代。我的循环看起来像这样:
for i in list1:
for j in list2:
function()
此电流循环需要太长时间。但是,list1
是需要从list2
检查的列表,但是从某个索引中,list2
之外不再有实例。这意味着从索引循环可能会更快,但问题是我不知道如何这样做。
在我的项目中,list2是一个包含三个键的词典列表:value
,name
和timestamp
。 list1是按顺序排列的timestamp
列表。该函数是基于value
的{{1}}并将其放入相应timestamp
列中的csv文件的函数。
这是list1中的条目示例:
name
这就是list2的样子:
[1364310855.004000, 1364310855.005000, 1364310855.008000]
在我的最终csv文件中,我应该有这样的东西:
答案 0 :(得分:2)
如果您想要快速,您应该重新构建list2中的数据,以加快查找速度:
# The following code converts list2 into a multivalue dictionary
from collections import defaultdict
list2_dict = defaultdict(list)
for item in list2:
list2_dict[item['timestamp']].append((item['name'], item['value']))
这使您可以更快地查找时间戳:
print(list2_dict)
defaultdict(<type 'list'>, {
1364310855.008: [('torque_at_transmission', -3), ('vehicle_speed', 0)],
1364310855.005: [('engine_speed', 0)],
1364310855.004: [('vehicle_speed', 0), ('accelerator_pedal_position', 0)]})
使用list2_dict
时,查找效率会更高:
for i in list1:
for j in list2_dict[i]:
# here j is a tuple in the form (name, value)
function()
答案 1 :(得分:0)
您似乎只想使用list2中与i*2
和i*2+1
对应的元素,即元素0,1和2,3,...
你只需要一个循环。
for i in range(len(list1)):
j = list[i*2]
k = list2[j+1]
# Process function using j and k
您只会处理到第一个列表的末尾。
答案 2 :(得分:0)
我认为pandas模块完全符合您的目标......
import ujson # 'ujson' (Ultra fast JSON) is faster than the standard 'json'
import pandas as pd
filter_list = [1364310855.004000, 1364310855.005000, 1364310855.008000]
def file2list(fn):
with open(fn) as f:
return [ujson.loads(line) for line in f]
# Use pd.read_json('data.json') instead of pd.DataFrame(load_data('data.json'))
# if you have a proper JSON file
#
# df = pd.read_json('data.json')
df = pd.DataFrame(file2list('data.json'))
# filter DataFrame with 'filter_list'
df = df[df['timestamp'].isin(filter_list)]
# convert UNIX timestamps to readable format
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
# pivot data frame
# fill NaN's with zeroes
df = df.pivot(index='timestamp', columns='name', values='value').fillna(0)
# save data frame to CSV file
df.to_csv('output.csv', sep=',')
#pd.set_option('display.expand_frame_repr', False)
#print(df)
output.csv
timestamp,accelerator_pedal_position,engine_speed,torque_at_transmission,vehicle_speed
2013-03-26 15:14:15.004,4.0,0.0,0.0,2.0
2013-03-26 15:14:15.005,0.0,5.0,0.0,0.0
2013-03-26 15:14:15.008,0.0,0.0,-3.0,1.0
PS我不知道你从哪里获得[Latitude,Longitude]列,但是将这些列添加到结果DataFrame中非常容易 - 只需在调用df.to_csv()
之前添加以下行
df.insert(0, 'latitude', 0)
df.insert(1, 'longitude', 0)
会导致:
timestamp,latitude,longitude,accelerator_pedal_position,engine_speed,torque_at_transmission,vehicle_speed
2013-03-26 15:14:15.004,0,0,4.0,0.0,0.0,2.0
2013-03-26 15:14:15.005,0,0,0.0,5.0,0.0,0.0
2013-03-26 15:14:15.008,0,0,0.0,0.0,-3.0,1.0