我有一个名为full_df的熊猫数据框。它的尺寸为(348204,18)。我正在使用以下数据创建字典。
wx_data = {}
key_len = range(count)
n = range(len(full_df))
for i in n:
#create key
key_len = str("%02d" % (full_df["year"][i])) + \
str("%02d" % (full_df["month"][i])) + \
str("%02d" % (full_df["day"][i])) + \
str("%02d" % (full_df["hour"][i])) + \
str("%02d" % (full_df["minute"][i]))
wx_data[key_len] = full_df.iloc[i].values.tolist()
我的代码中的for循环非常慢。如何提高效率? 谢谢!
答案 0 :(得分:0)
您可以尝试使用joblib:
import pandas as pd
import numpy as np
from joblib import Parallel, delayed
def convert_df(full_df):
#create key
key_len = str("%02d" % (full_df["year"])) + \
str("%02d" % (full_df["month"])) + \
str("%02d" % (full_df["day"])) + \
str("%02d" % (full_df["hour"])) + \
str("%02d" % (full_df["minute"]))
return key_len, full_df.values.tolist()
wx_data = Parallel(n_jobs=-1)(delayed(convert_df)(row) for _, row in full_df.iterrows())
wx_data = dict(wx_data)