如何有效地计算pandas中列的线性组合?

时间:2017-07-11 13:52:17

标签: python pandas

我需要做以下计算:

priors['user_product'] = priors.eval('product_id + user_id*100000')

其中user_product是我想要生成的新列。 然而,由于先验数据帧很大(确切地说有3000000行),计算需要花费很多时间。

1 个答案:

答案 0 :(得分:3)

如果您想要快速,可以使用numpynumexpr或普通pandas

pandas

priors['user_product'] = priors.product_id + 100000 * priors.user_id

numpy

priors['user_product'] = priors.product_id.values + 100000 * priors.user_id.values

numexpr

pid = priors.product_id.values
uid = priors.user_id.values
priors['user_product'] = numexpr.evaluate('pid + 100000 * uid')

计时

n = 3000000
priors = pd.DataFrame(dict(product_id=np.random.rand(n), user_id=np.random.rand(n)))

%timeit priors['user_product'] = priors.eval('product_id + 100000 * user_id')
%timeit priors['user_product'] = priors.product_id.values + 100000 * priors.user_id.values
%timeit priors['user_product'] = priors.product_id + 100000 * priors.user_id

10 loops, best of 3: 31.6 ms per loop
100 loops, best of 3: 17.6 ms per loop
100 loops, best of 3: 18.5 ms per loop

%%timeit
pid = priors.product_id.values
uid = priors.user_id.values
priors['user_product'] = numexpr.evaluate('pid + 100000 * uid')

100 loops, best of 3: 13.6 ms per loop