Question

我有一个pandas数据框，其中包含一个包含字符串值和布尔值的列。由于这种差异，列的dtype推断为'对象'。当我在这个列上运行.str.strip（）时，它将我的所有布尔值都转换为NaN。有谁知道我怎么能阻止这个？我可以将布尔值变为字符串，但是Nan？

Answer 1

从piRSquared借用df：

首先将所有值转换为string，然后剥离：

df['A'] = df['A'].astype(str).str.strip()
print (df)
       A
0      a
1      b
2   True
3  False
4   True

如果需要混合类型 - 带字符串的布尔值将combine_first替换NaN替换为boolean：

df['A'] = df['A'].str.strip().combine_first(df.A)
print (df)
       A
0      a
1      b
2   True
3  False
4   True

如果需要转换所有列：

df = df.astype(str).applymap(lambda x: x.strip())

或者：

df = df.astype(str).apply(lambda x: x.str.strip())

Answer 2

设置

df = pd.DataFrame(dict(A=[' a', ' b ', True, False, 'True']))

选项1
将pd.Series.str.strip字符串存取方法与fillna

一起使用

df.A.str.strip().fillna(df.A)

0        a
1        b
2     True
3    False
4     True
Name: A, dtype: object

注意：
type为str或bool

df.A.str.strip().fillna(df.A).apply(type) 0 <class 'str'> 1 <class 'str'> 2 <class 'bool'> 3 <class 'bool'> 4 <class 'str'> Name: A, dtype: object

选项2
使用pd.Series.replace

df.A.replace('^\s+|\s+$', '', regex=True) 0 a 1 b 2 True 3 False 4 True Name: A, dtype: object

此处还保留了混合类型。

我们可以使用pd.DataFrame.replace对整个数据框进行操作

df.replace('^\s+|\s+$', '', regex=True) A 0 a 1 b 2 True 3 False 4 True

在pandas DF列上剥离空白将bool值转换为NaN

2 个答案: