多个for循环从python中的数据框中选择特殊行

时间:2016-09-29 22:46:44

标签: python for-loop dataframe

我在python中有一个大数据框,我想根据多个for循环选择特定的行。有些列包含列表。我的最终目标是生成一些优化约束并将它们传递给另一个软件:

   T        S        W     Arrived    Departed     
  [1,2]    [4,2]     1        8          10
  [3,4,5]   [3]      1        12         18
  [6,7]    [1,2]     2        10         11
    .        .       .        .          .
    .        .       .        .          .

  def Cons(row):

    if row['W'] == w and sum(pd.Series(row['T']).isin([t])) != 0 and sum(pd.Series(row['S']).isin([s])) != 0:
           return 1

  for w in range(50):
      for s in range(30):
          for t in range(12):
              df.Situation = df.apply(Cons, axis = 1)
              A = df[ (df.Situation == 1) ] 
              A1 = pd.Series(A.Arrived).tolist()
              D1 = pd.Series(A.Departed).tolist()
              Time = tuplelist(zip(A1,D1))

如何有效地执行此操作,因为通过多个for循环需要很长时间才能运行?

1 个答案:

答案 0 :(得分:0)

目前,您不断调整每个嵌套循环的数据帧,每次重写A并且不会产生增长的结果,而只会产生最后一次迭代。

但是考虑创建所有范围的交叉连接,然后检查相等逻辑:

wdf = pd.DataFrame({'w': range(50), 'key': 1})
sdf = pd.DataFrame({'s': range(30), 'key': 1})
tdf = pd.DataFrame({'t': range(12), 'key': 1})

dfs = [wdf, sdf, tdf]

# DATA FRAME OF CROSS PRODUCT w X s X T (N = 18,000)
rangedf = reduce(lambda left,right: pd.merge(left, right, on=['key']), dfs)[['w','s','t']]
#    w  s  t
# 0  0  0  0
# 1  0  0  1
# 2  0  0  2
# 3  0  0  3
# 4  0  0  4
# ...

def Cons(row):    
    if any((rangedf['w'].isin([row['W']])) & (rangedf['t'].isin([row['T']])) & \
           (rangedf['s'].isin([row['S']]))) == True:
        return 1

df.Situation = df.apply(Cons, axis = 1)
A = df[ (df.Situation == 1) ].reset_index(drop=True)