Question

我知道如何在pandas（link）中简单地四舍五入，但是，我的问题是如何在pandas中同时四舍五入并进行计算。

df['age_new'] = df['age'].apply(lambda x: round(x['age'] * 0.024319744084, 0.000000000001))

TypeError: 'float' object is not subscriptable

有什么办法吗？

Answer 1

有两个问题：

x['age']不需要['age']，因为您已经将其应用于列age（这就是为什么会出现错误）
round以int作为第二个参数。

尝试

df['age_new'] = df['age'].apply(lambda x: round(x * 0.024319744084, 5))

（{5只是一个例子。）

Answer 2

.apply未向量化。
- 在.apply上使用pandas.Series时，例如'age'变量lambda是x列，例如'age'列，因此正确语法为round(x * 0.0243, 4)
- round的ndigits参数需要int，而不是float。
使用矢量化方法（例如.mul，然后使用.round）更快。
- 在这种情况下，有1000行，矢量化方法比使用.apply快4倍。

import pandas as pd
import numpy as np

# test data
np.random.seed(365)
df = pd.DataFrame({'age': np.random.randint(110, size=(1000))})

%%timeit
df.age.mul(0.024319744084).round(5)
[out]:
212 µs ± 3.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
(df['age'] * 0.024319744084).round(5)
[out]:
211 µs ± 9.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
df.age.apply(lambda x: round(x * 0.024319744084, 5))
[out]:
845 µs ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

如何对熊猫进行四舍五入计算

2 个答案: