基于来自其他两个数据框的值的数据框单元的批量计算

时间:2019-05-20 07:08:12

标签: python pandas

基于第一个数据帧

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(1), freq='D')
symbols = ['A','B']
np.random.seed(seed=1111)
dataA = np.random.randint(1, high=100, size=len(days))
dataB = np.random.randint(1, high=100, size=len(days))
df1 = pd.DataFrame({symbols[0]: dataA,symbols[1] :dataB },index=days)
print(df1)
                             A   B
2019-05-20 06:52:21.013198  29  82
2019-05-21 06:52:21.013198  56  13

和第二个数据帧

df2 = pd.DataFrame({'const1': [1,2],'const2' : [2,3] },index=['A','B'])
print(df2)
   const1  const2
A       1       2
B       2       3

我想计算与第一个数据帧具有相同结构的第三个数据帧,其中每个单元格是合并前两个数据帧的值的计算结果。

以下代码正确计算了第三个数据帧的每个单元格:

df3 = df1
for symbol in symbols:
    const1 = df2.at[symbol,'const1']
    const2 = df2.at[symbol,'const2']
    for index, row in df1.iterrows():
        value = df1.at[index,symbol] 
        df3.at[index,symbol] = const1*value + const2*value
print(df3)
                              A    B
2019-05-20 06:58:52.753879   87  410
2019-05-21 06:58:52.753879  168   65

如何摆脱丑陋的循环并更有效地进行计算?

2 个答案:

答案 0 :(得分:0)

尝试使用:

df3 = df1 * df2.sum(axis=1)

现在:

print(df3)

是:

                              A    B
2019-05-20 06:58:52.753879   87  410
2019-05-21 06:58:52.753879  168   65

答案 1 :(得分:0)

如果可能,某些符号不匹配:

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(1), freq='D')
symbols = ['A','C']
np.random.seed(seed=1111)
dataA = np.random.randint(1, high=100, size=len(days))
dataB = np.random.randint(1, high=100, size=len(days))
df1 = pd.DataFrame({symbols[0]: dataA,symbols[1] :dataB },index=days)
print(df1)
                             A   C
2019-05-20 09:24:33.383637  29  82
2019-05-21 09:24:33.383637  56  13

df2 = pd.DataFrame({'const1': [1,2],'const2' : [2,3] },index=['A','B'])
print(df2)
   const1  const2
A       1       2
B       2       3

df3 = df1.mul(df2.sum(axis=1).reindex(df1.columns, fill_value=1))
print (df3)
                              A   C
2019-05-20 09:25:48.075084   87  82
2019-05-21 09:25:48.075084  168  13

因为:

df3 = df1.mul(df2.sum(axis=1), fill_value=1)
  

NotImplementedError:不支持fill_value 1。