实施例

Question

实施例

玩具数据框：

>>> df = pd.DataFrame({'a': ['the', 'this'], 'b': [5, 2.3], 'c': [8, 11], 'd': ['the', 7]})

的产率：

>>> df

      a    b   c    d
0   the  5.0   8  the
1  this  2.3  11    7

和

>>> df.dtypes

a     object
b    float64
c      int64
d     object
dtype: object

问题陈述

但我真正想要做的是执行df.apply以便我可以对列中的值执行某些操作，如果该列/系列是字符串类型

所以我想我可以简单地做一些事情：

>>> df.apply(lambda x: if x.dtype == 'object' and <the other check I care about>)

但它并没有像我预期的那样奏效，一切都是object。要验证，请尝试：

>>> df.apply(lambda x: x.dtype == 'object')
a    True
b    True
c    True
d    True
dtype: bool

试图了解发生了什么，我尝试了以下内容：

>>> def tmp_fn(val, typ):
...   if val.dtype == typ:
...     print(type(val))
...     print(val.dtype)

然后

>>> df.apply(lambda x: tmp_fn(x, 'object'))
<class 'pandas.core.series.Series'>
object
<class 'pandas.core.series.Series'>
object
<class 'pandas.core.series.Series'>
object
<class 'pandas.core.series.Series'>
object
a    None
b    None
c    None
d    None
dtype: object

尝试理解

现在我知道发生了什么：大熊猫系列被解释为一系列。似乎容易解决。

但事实上，它并不适用于一系列通常在其他情况下工作。例如，如果我尝试：

>>> df.a.dtype
dtype('O')

>>> df.b.dtype
dtype('float64')

这两个都按照我的预期工作，并给我一个对象类型在系列中，而不是简单的事实，它是一个系列。

但是尽可能地尝试，我无法找到在pandas.DataFrame.apply内复制同样行为的方法。这里发生了什么？我怎样才能让这个系列像往常一样动作？换句话说，如何让pandas.DataFrame.apply完全像pandas.Series一样工作？我从来不知道/意识到他们直到现在都没有相同的行为。

Answer 1

您可以在result_type='expand'中使用.apply()这样，列表式结果就会变成列。您可以在docs：

中阅读更多内容

df.apply(lambda x: x.dtype, result_type='expand')

输出：

a     object
b    float64
c      int64
d     object
dtype: object

没有result_type='expand'：

df.apply(lambda x: print(x))

给出：

0     the
1    this
Name: a, dtype: object
0      5
1    2.3
Name: b, dtype: object
0     8
1    11
Name: c, dtype: object
0    the
1      7
Name: d, dtype: object

使用result_type='expand'：

df.apply(lambda x: print(x), result_type='expand')

输出：

0     the
1    this
Name: a, dtype: object
0    5.0
1    2.3
Name: b, dtype: float64
0     8
1    11
Name: c, dtype: int64
0    the
1      7
Name: d, dtype: object

Answer 2

您可以将df.dtypes存储在变量中，稍后使用类似字典的语法访问它。这是有效的，因为pd.DataFrame.apply将命名的系列传递给指定的函数。

这是一个最小的例子：

df = pd.DataFrame({'a': ['the', 'this'], 'b': [5, 2.3], 'c': [8, 11], 'd': ['the', 7]})

type_map = df.dtypes

def tmp_fn(val, type_map, typ):
    if type_map[val.name] == typ:
        print(val.name, type(val))
        print(type_map[val.name])

df['e'] = df.apply(lambda x: tmp_fn(x, type_map, 'object'))

a <class 'pandas.core.series.Series'>
object
d <class 'pandas.core.series.Series'>
object

Pandas数据帧`apply`到`dtype`会产生意想不到的结果

实施例

问题陈述

尝试理解

2 个答案: