Question

我有df这样：

     t1      t2     t3
0    a       b      c
1            b      
2 
3    
4    a       b      c
5            b      
6
7

我想删除索引5之后的所有值，因为它没有值，但没有索引2，3。我不知道每列是否有数据。

所有值都是字符串。

Answer 1

In [74]: df.iloc[:np.where(df.any(axis=1))[0][-1]+1]
Out[74]: 
   t1 t2 t3
10  a  b  c
11  b      
12         
13         
14  a  b  c
15  b

解释：首先查找哪些行包含空字符串以外的内容：

In [37]: df.any(axis=1)
Out[37]: 
0     True
1     True
2    False
3    False
4     True
5     True
6    False
7    False
dtype: bool

找到行的位置为True：

In [71]: np.where(df.any(axis=1))
Out[71]: (array([0, 1, 4, 5]),)

找到最大的索引（也将是最后一个）：

In [72]: np.where(df.any(axis=1))[0][-1]
Out[72]: 5

然后，您可以使用df.iloc选择所有行，包括值为5的索引。

请注意，我建议的第一种方法并不健全;如果您的数据框有具有重复值的索引，然后选择df.loc的行有问题的。

新方法也快一点：

In [75]: %timeit df.iloc[:np.where(df.any(axis=1))[0][-1]+1]
1000 loops, best of 3: 203 µs per loop

In [76]: %timeit df.loc[:df.any(axis=1).cumsum().argmax()]
1000 loops, best of 3: 296 µs per loop

python pandas清理最后一行数据后的空行

1 个答案: