Question

这可能是我的一个根本误解，但我希望pandas.Series.str将pandas.Series值转换为字符串。

但是，当我执行以下操作时，系列中的数值将转换为np.nan：

df = pd.DataFrame({'a': ['foo    ', 'bar', 42]})
df = df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)
print(df)

Out:
     a
0  foo
1  bar
2  NaN

如果我首先将str函数应用于每个列，则数值会转换为字符串而不是np.nan：

df = pd.DataFrame({'a': ['foo    ', 'bar', 42]})
df = df.apply(lambda x: x.apply(str) if x.dtype == 'object' else x)
df = df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)
print(df)

Out:
     a
0  foo
1  bar
2   42

该主题的文档相当少。我错过了什么？

Answer 1

在这一行：

df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)

x.dtype正在查看整个系列（列）。该列不是数字。因此，整个列的操作类似于字符串。

在第二个示例中，不保留该数字，它是一个字符串'42'。

输出的差异将归因于熊猫str和python的str的不同。

对于pandas .str，这不是转换，它是一个访问器，允许您对每个元素执行.strip()。这意味着您尝试将.strip()应用于整数。这会引发异常，并且pandas通过返回Nan来响应异常。

在.apply(str)的情况下，您实际上是将值转换为字符串。稍后当你应用.strip()时，这会成功，因为该值已经是一个字符串，因此可以被剥离。

Answer 2

您使用.apply的方式是 columns ，所以请注意：

>>> df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)
     a
0  foo
1  bar
2  NaN

它对该列采取了行动，x.dtype 总是 object。

>>> df.apply(lambda x:x.dtype)
a    object
dtype: object

如果你确实按行使用axis=1，你仍然会看到相同的行为：

>>> df.apply(lambda x:x.dtype, axis=1)
0    object
1    object
2    object
dtype: object

瞧瞧：

>>> df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x, axis=1)
     a
0  foo
1  bar
2  NaN
>>>

因此，当它显示object dtype时，它意味着 Python object 。因此，请考虑非对象数字列：

>>> S = pd.Series([1,2,3])
>>> S.dtype
dtype('int64')
>>> S[0]
1
>>> S[0].dtype
dtype('int64')
>>> isinstance(S[0], int)
False

而使用此对象dtype列：

>>> df
         a
0  foo
1      bar
2       42
>>> df['a'][2]
42
>>> isinstance(df['a'][2], int)
True
>>>

你实际上是这样做的：

>>> s = df.a.astype(str).str.strip()
>>> s
0    foo
1    bar
2     42
Name: a, dtype: object
>>> s[2]
'42'

注意：

>>> df.apply(lambda x: x.apply(str) if x.dtype == 'object' else x).a[2]
'42'

为什么pandas Series.str会将数字转换为NaN？

2 个答案: