Question

在我读取并过滤excel文件后，我最终得到了一个看起来像这样的pandas数据框。

Col1    Col2
afaf    abc 1
1512        
asda    cdd 2
adsd

我想结束

Col1    Col2
afaf    abc1
asda    cdd2

我尝试了df['Col2'].replace('',np.nan,inplace=True)并且之后做了dropna但没有任何内容被替换，所以我认为它无法替换，因为column2在这些空行中有多个空格。

我忘了提到我不能使用strip，因为Col2字符串有空格我需要保持不变。

有什么想法吗？

Answer 1

我认为您可以使用boolean indexing条件来strip删除一些可能的空格，然后检查len下的length是否不是0：< / p>

print (df[df.Col2.str.strip().str.len() != 0])
   Col1   Col2
0  afaf  abc 1
2  asda  cdd 2

如果没有空格：

df[df.Col2.str.len() != 0]

Answer 2

您可以使用pandas str.strip()功能剥离列。这应该删除所有的空格。

看起来像这样

df['Col2'].str.strip().replace('',np.nan,inplace=True)

因此，使用pipe，您可以接收非纳米行

df.iloc[df.pipe(lambda x:x['Col2'].str.strip().replace('',np.nan)).dropna().index]

后一种更新的解决方案也适用于您额外的空白限制。但请注意，我在发布约束之前使用了管道。

现在，我选择例如像Jezrael这样的解决方案，但制定为

df[df['Col2'].str.strip() !='']

我认为，这比使用len函数

更清晰一点

刚刚在一个非常小的数据帧上执行了一些基准测试。 PirSquared解决方案速度最快，紧随其后的是Jezrael's，其次是我的解决方案，使用与“＆＃39;”进行比较。最后一个地方是管道变体。

Answer 3

使用str.match

df[~df.Col2.str.match(r'^\s*$')]