Question

我正在尝试将pandas数据框的一列中的值四舍五入到另一列中指定的小数位，如下面的代码所示。

    df = pandas.DataFrame({
        'price': [14.5732, 145.731, 145.722, 145.021],
        'decimal': [4, 3, 2, 2]
    })
    df['price'] = df.apply(lambda x: round(x.price, x.decimal), axis=1)

但是，这样做会导致以下错误：

>   df['price'] = df.apply(lambda x: round(x.price, x.decimal), axis=1)
E   TypeError: ('integer argument expected, got float', 'occurred at index 0')

文档显示，好像round期望在索引0处有一个浮点数，但显然并不高兴。将price更改为int可以修复错误，但这会破坏代码本身的含义。

Answer 1

这是loooong time在大熊猫身上的痛点。当访问单行或沿第一个轴调用apply时，dtype强制会相当定期地发生。该错误消息令人困惑，因为很明显十进制序列的dtype是整数类型，因此round方法应该接受它，但强制发生在幕后。

您可以同时使用iloc和apply进行检查：

>>> df.iloc[0]
price      14.5732
decimal     4.0000
Name: 0, dtype: float64

>>> df.apply(lambda x: x, axis=1)
      price  decimal
0   14.5732      4.0
1  145.7310      3.0
2  145.7220      2.0
3  145.0210      2.0

更令人沮丧的是，如果您有一个对象dtype列，则不会强制执行任何操作，因此该行为并不是那么容易预测！

>>> df['foo'] = 'bar'
>>> df.iloc[0]
price      14.5732
decimal          4
foo            bar
Name: 0, dtype: object

长话短说，这令人困惑并且根本不直观。有几种解决方法是在lambda函数中强制转换小数或使用列表理解（可能比应用更快）。

>>> df.apply(lambda x: round(x.price, int(x.decimal)), axis=1)
0     14.5732
1    145.7310
2    145.7200
3    145.0200
dtype: float64

>>> [round(x, y) for x, y in zip(df['price'], df['decimal'])]
[14.5732, 145.731, 145.72, 145.02]

请注意，以系列显示时，表示形式不会改变，但是值将四舍五入。

Answer 2

您可以像这样使用生成器：

>>> gen = (i for i in df.decimal)
>>> df.price = df.price.apply(lambda x: round(x, next(gen))) 
>>> df
      price  decimal
0   14.5732        4
1  145.7310        3
2  145.7200        2
3  145.0200        2

Answer 3

有效：

df['price'] = df.apply(lambda x: round(x.price, int(x.decimal)), axis=1)

如何将回合应用于两个熊猫列

3 个答案: