Question

我有一个pandas数据框，其中有一些观察到的空字符串，我想用NaN（np.nan替换）。

我成功使用

替换了大多数这些空字符串

df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)

但是我仍然发现空字符串。例如，当我运行

sub_df = df[df['OBJECT_COL'] == '']
sub_df.replace(r'\s+', np.nan, regex = True)
print(sub_df['OBJECT_COL'] == '')

输出全部返回True

我应该尝试使用其他方法吗？是否有一种方法可以读取这些单元格的编码，从而使我的.replace()无效，因为编码很奇怪？

Answer 1

pd.Series.replace默认情况下不能正常工作。您需要明确指定inplace=True：

sub_df.replace(r'\s+', np.nan, regex=True, inplace=True)

或者，或者分配回sub_df：

sub_df = sub_df.replace(r'\s+', np.nan, regex=True)

Answer 2

另一种选择。

sub_df.replace(r'^\s+$', np.nan, regex=True)

或，用一个空格替换一个空字符串和记录

sub.df.replace(r'^\s*$', np.nan, regex=True)

替代：

将apply()与lambda函数一起使用。

sub_df.apply(lambda x: x.str.strip()).replace('', np.nan)

只是示例说明：

>>> import numpy as np
>>> import pandas as pd

示例DataFrame具有空字符串和空格。

>>> sub_df
        col_A
0
1
2   somevalue
3  othervalue
4

适用于不同条件的解决方案：

最佳解决方案：

1）

>>> sub_df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)
        col_A
0         NaN
1         NaN
2   somevalue
3  othervalue
4         NaN

2）这对两种情况都有效，但部分无效：

>>> sub_df.replace(r'^\s+$', np.nan, regex=True)
        col_A
0
1         NaN
2   somevalue
3  othervalue
4         NaN

3）这在两种情况下都适用。

>>> sub_df.replace(r'^\s*$', np.nan, regex=True)

            col_A
    0         NaN
    1         NaN
    2   somevalue
    3  othervalue
    4         NaN

4）这在两种情况下都适用。

>>> sub_df.apply(lambda x: x.str.strip()).replace('', np.nan)
        col_A
0         NaN
1         NaN
2   somevalue
3  othervalue
4         NaN

Answer 3

尝试np.where：

df['OBJECT_COL'] = np.where(df['OBJECT_COL'] == '', np.nan, df['OBJECT_COL'])

使用Pandas.DataFranme.replace（）用NaN替换空字符串时遇到麻烦

3 个答案:

只是示例说明：

适用于不同条件的解决方案：