Question

我有以下代码执行得很慢，大概是因为for循环造成的：

 d= pd.read_csv(fname, sep=r'\s+')

 # remove items with 12 duplicates. They are not useful
 for i in range(d.ncyc.max()):
    d1 = d[d.ncyc==i].copy()
    n1, n2 =d1.shape
    if n1 == 12:
       #print (d1.ncyc)
       d.drop(d.loc[d.ncyc==i].index, inplace=True)

如何加快数据行的删除速度？

例如，对于以下假设数据：

 ncyc     l          A      B
   0       1        0.1     0.22
   0       2        0.1     0.24
   0       3        0.1     0.25
   0       4        0.1     0.26
   0       5        0.1     0.27
   0       6        0.1     0.27
   0       1        0.1     0.28
   0       2        0.1     0.29
   0       3        0.1     0.20
   0       4        0.1     0.25
   0       5        0.1     0.246
   0       6        0.1     0.26
   1       1        0.1     0.28
   1       2        0.1     0.29
   1       3        0.1     0.20
   1       4        0.1     0.25
   1       5        0.1     0.246
   1       6        0.1     0.26

将减少为：

 ncyc     l          A      B
   1       1        0.1     0.28
   1       2        0.1     0.29
   1       3        0.1     0.20
   1       4        0.1     0.25
   1       5        0.1     0.246
   1       6        0.1     0.26

基于存在ncyc == 0的12个项目的条件

加快从熊猫DataFrame删除行的速度

0 个答案: