如何删除仅包含某些值的行?

时间:2020-07-06 05:49:29

标签: python pandas dataframe rows delete-row

我有一个这样的数据框

    column_name 
0   OnePlus phones never fail to meet my expectatiion.  
1   received earlier than expected for local set.   
2   \n  
3   good    
4   must buy!
5   \t
6     
7   awesome product!  
8     \n    

我要删除其中仅包含\n\t \n的所有行。

输出应如下所示:

    column_name 
0   OnePlus phones never fail to meet my expectatiion.  
1   received earlier than expected for local set.   
2   good    
3   must buy!
4   awesome product!

我尝试了以下方法:

  df = df[df.column_name != '\n'].reset_index(drop=True)
  df = df[df.column_name != ''].reset_index(drop=True)
  df = df[df.column_name != ' '].reset_index(drop=True)
  df = df[df.column_name != '   '].reset_index(drop=True)
  df = df[df.column_name != ' \n '].reset_index(drop=True)

但是有没有更多的优雅方式或pythonic方式来做到这一点,而不是重复执行代码?

3 个答案:

答案 0 :(得分:3)

您可以使用Series.str.strip并仅比较空字符串:

df1 = df[df.column_name.str.strip() != ''].reset_index(drop=True)

或将空值转换为布尔值:

df1 = df[df.column_name.str.strip().astype(bool)].reset_index(drop=True)

或者过滤词,对我来说strip是必要的(也许应该删除真实数据strip):

df1 = df[df.column_name.str.strip().str.contains('\W', na=False)].reset_index(drop=True)

如果需要删除丢失的值,并且没有字符串值将这些值替换为NaN,然后使用DataFrame.dropna

df.column_name = df.column_name.replace(r'^\s*$', np.nan, regex=True)
df1 = df.dropna(subset=['column_name']).reset_index(drop=True)

答案 1 :(得分:2)

使用df.str.contains()检查正斜杠后是否有任何较小的字母

df[df.Column Name.str.contains('[\\][a-z]+',case=True, na=False, regex=True)]

在您的情况下,数据:

print(pd.DataFrame({'A':['OnePlus phones never fail to meet my expectatiion','received earlier than expected for local set.','\n','good','\t', np.nan,'must buy!','','awesome product!','\n' ]}))

                                               A
0  OnePlus phones never fail to meet my expectatiion
1      received earlier than expected for local set.
2                                                 \n
3                                               good
4                                                 \t
5                                                NaN
6                                          must buy!
7                                                   
8                                   awesome product!
9                                                 \n

解决方案

print(df[df.A.str.contains('[\\][a-z]+',case=True, na=False, regex=True)])



                             A
0  OnePlus phones never fail to meet my expectatiion
1      received earlier than expected for local set.
3                                               good
6                                          must buy!
8                                   awesome product!

答案 2 :(得分:1)

另一种方法,删除条目与标记元素匹配的行:

df = df[~df['column_name'].isin(['\\n','\\t'])].dropna()

如果最后一行(或其他行)中有多余的空格,则可以先执行以下操作:

df['column_name'] = df['column_name'].str.strip()