Question

我有一个（n，m）维数据框，其列为“ dtype对象”，其中包含不同长度字符串的条目。 df如下所示：

      col1    col2    col3    col4    ...   colm
    |---------------------------------------------    
row1| str1,1  str1,2  str1,3  str1,4  ...   str1,m
row2| str2,1  str2,2  str2,3  str2,4  ...   str2,m
.   | .       .       .       .       ...   .
.   | .       .       .       .       ...   . 
.   | .       .       .       .       ...   .
rown| strn,1  strn,2  strn,3  strn,4  ...   strn,m

我想用NaN替换特定的字符串，条件是字符串的长度必须小于10，但仅限于某些列。

这是我的代码：

column_list = ['col1','col3']
df.loc[:,column_list] = df.apply(lambda x: x.str.replace(x,np.NaN) if len(x) < 10 else x)

代码正在运行，没有错误，但不幸的是，实际上并没有对我在那些列中的值做任何事情。我认为我的问题与以下部分有关：

x.str.replace(x,np.NaN)

我不认为“ x”应该在“ replace”函数中。

感谢帮助。

谢谢

Answer 1

在获得mask的字符串长度后，只需处理str.len

s=df.apply(lambda x : x.str.len())<10
df.loc[:,column_list]=df.loc[:,column_list].mask(s)

熊猫-用多列中的特定长度替换字符串

1 个答案: