我有以下代码:
indices_to_remove= []
for i in range(0,len(df)):
if (df.speed.values[i] <= 15 ):
counter += 1
if counter > 600:
indices_to_remove.append(i)
else:
counter= 0
df= df.drop (indices_to_remove, axis=0)
此代码的主要目标是遍历数据集中的所有行,并且如果有600多个连续行且速度值小于15,则该代码会将行索引添加到indexs_to_remove和那么所有这些行都将被删除。
答案 0 :(得分:1)
您正在尝试并行执行两项操作,删除索引并计算小于15的600个连续值。我将把这两个想法分为两个步骤。
indices_to_remove= []
#Get all indexes to remove from the dataframe
for i in range(0,len(df)):
if (df.speed.values[i] <= 15 ):
indices_to_remove.append(i)
#Have a counter which keeps track of 600 consecutive indexes less than 15
counter = 0
max_counter = -1
for idx in range(len(indices_to_remove)-1):
#If the indexes were consecutive, keep a counter
if ((indices_to_remove[idx+1] - indices_to_remove[idx]) == 1):
counter += 1
#Else if non consecutive indexes are found, track the last maximum counter and reset the original counter
else:
if counter > max_counter:
max_counter = counter
counter = 0
if max_counter > 600:
df = df.drop(indices_to_remove, axis=0)
答案 1 :(得分:0)
这不是一个优雅的解决方案,而是基于您所拥有的功能:
indices_to_remove= []
indices_counter = []
for i in range(0,len(df)):
if (df.speed.values[i] <= 15 ):
counter += 1
indices_counter.append(i)
if counter > 600:
commit = True
elif commit:
indices_to_remove.extend(indices_counter)
indices_counter = []
commit = False
counter= 0
else:
indices_counter = []
commit = False
counter= 0
df= df.drop(indices_to_remove, axis=0)
答案 2 :(得分:0)
建立一个索引字典,其中连续速度小于或等于15。
indices_dict = {}
k = 0
for i in range(0, len(df)):
if (df.speed.values[i] <= 15 ):
try:
indices_dict[k].append(i)
except KeyError:
indices_dict[k] = [i]
else:
k += 1
lol_to_remove = [ v for k,v in indices_dict.items() if len(v)>= 600 ] # This is a list of lists (lol)
indices_to_remove = [i for v in lol_to_remove for i in v ] # flatten the list
df = df.drop(indices_to_remove, axis=0)