Question

我有一个pandas系列和一个pandas多索引数据框。

这是这种情况的一个简单例子：

iterables = [['milk', 'honey', 'dates'], ['jan', 'feb', 'mar', 'apr']]
i = pd.MultiIndex.from_product(iterables, names=['good', 'month'])
xf = pd.DataFrame(index = i)
xf['price'] = np.random.randint(1, 25, xf.shape[0])

allocation_vector = pd.Series([0.3, 0.6, 0.1], index = ['milk', 'honey', 'dates'])

此数据框表示“从1月到4月的每个月三种产品的价格” location_vector表示价格的一部分。

我要实现的是将分配向量乘以我的数据帧，得到一个序列，该序列的索引为“ jan”，“ feb”，“ mar”，“ apr”，并且该值等于该月的点积（即：{ {1}}，分别用于1月，2月，3月，4月）

我只能用讨厌的迭代hacky解决方案来解决这个问题。我认为必须有更多的pythonic方法来执行此操作，并且在这里我不必担心向量列对数据框列等的乘积顺序错误。当然，实际的数据框具有更多的列没有参与计算。

Answer 1

我相信您需要在第一级乘以Series.mul，然后在每个第一级求和：

np.random.seed(2019)

iterables = [['milk', 'honey', 'dates'], ['jan', 'feb', 'mar', 'apr']]
i = pd.MultiIndex.from_product(iterables, names=['good', 'month'])
xf = pd.DataFrame(index = i)
xf['price'] = np.random.randint(1, 25, xf.shape[0])
print (xf)
             price
good  month       
milk  jan        9
      feb       19
      mar        6
      apr       23
honey jan       16
      feb       13
      mar       11
      apr       17
dates jan       17
      feb        8
      mar        6
      apr       20

allocation_vector = pd.Series([0.3, 0.6, 0.1], index = ['milk', 'honey', 'dates'])

print (17*0.1+9*0.3+16*0.6)
14.0

s = xf['price'].mul(allocation_vector, level=0).sum(level=1)
print (s)
month
jan    14.0
feb    14.3
mar     9.0
apr    19.1
dtype: float64

或通过Series.unstack进行整形，转置并使用DataFrame.dot，但是输出中值的顺序已更改：

s = xf['price'].unstack().T.dot(allocation_vector)
print (s)
month
apr    19.1
feb    14.3
jan    14.0
mar     9.0
dtype: float64

Answer 2

您可以使用join和groupby的组合来实现所需的目标，如下所示：

allocation_vector.name = 'pct'
xf = xf.join(allocation_vector, on='good')
xf['dotproduct'] = xf.price * xf.pct

print(xf)

结果数据框为：

             price  pct  dotproduct
good  month
milk  jan       19  0.3         5.7
      feb        8  0.3         2.4
      mar        7  0.3         2.1
      apr       15  0.3         4.5
honey jan        9  0.6         5.4
      feb       10  0.6         6.0
      mar        7  0.6         4.2
      apr       11  0.6         6.6
dates jan        2  0.1         0.2
      feb       14  0.1         1.4
      mar       12  0.1         1.2
      apr        7  0.1         0.7

然后您可以使用以下方法获得所需的结果：

print(xf.groupby('month')['dotproduct'].sum())

输出为：

month
apr    11.8
feb     9.8
jan    11.3
mar     7.5

将pd.Series向量与multindex pd.Dataframe相乘

2 个答案: