我在python中有一个大数据框,我想根据多个for循环选择特定的行。有些列包含列表。我的最终目标是生成一些优化约束并将它们传递给另一个软件:
T S W Arrived Departed
[1,2] [4,2] 1 8 10
[3,4,5] [3] 1 12 18
[6,7] [1,2] 2 10 11
. . . . .
. . . . .
def Cons(row):
if row['W'] == w and sum(pd.Series(row['T']).isin([t])) != 0 and sum(pd.Series(row['S']).isin([s])) != 0:
return 1
for w in range(50):
for s in range(30):
for t in range(12):
df.Situation = df.apply(Cons, axis = 1)
A = df[ (df.Situation == 1) ]
A1 = pd.Series(A.Arrived).tolist()
D1 = pd.Series(A.Departed).tolist()
Time = tuplelist(zip(A1,D1))
如何有效地执行此操作,因为通过多个for循环需要很长时间才能运行?
答案 0 :(得分:0)
目前,您不断调整每个嵌套循环的数据帧,每次重写A
并且不会产生增长的结果,而只会产生最后一次迭代。
但是考虑创建所有范围的交叉连接,然后检查相等逻辑:
wdf = pd.DataFrame({'w': range(50), 'key': 1})
sdf = pd.DataFrame({'s': range(30), 'key': 1})
tdf = pd.DataFrame({'t': range(12), 'key': 1})
dfs = [wdf, sdf, tdf]
# DATA FRAME OF CROSS PRODUCT w X s X T (N = 18,000)
rangedf = reduce(lambda left,right: pd.merge(left, right, on=['key']), dfs)[['w','s','t']]
# w s t
# 0 0 0 0
# 1 0 0 1
# 2 0 0 2
# 3 0 0 3
# 4 0 0 4
# ...
def Cons(row):
if any((rangedf['w'].isin([row['W']])) & (rangedf['t'].isin([row['T']])) & \
(rangedf['s'].isin([row['S']]))) == True:
return 1
df.Situation = df.apply(Cons, axis = 1)
A = df[ (df.Situation == 1) ].reset_index(drop=True)