将多索引DataFrame的所有列乘以系列中的适当值

时间:2014-03-21 12:45:28

标签: python pandas

我觉得这个应该是显而易见的,但我有点卡住了。

我有一个DataFrame(df),行上有3级MultiIndex。 MultiIndex的其中一个级别是ccy,表示该行中包含的信息的货币。每行有3列数据。

我想将所有数据转换为以参考货币(比如美元)计价。为此,我有一系列(forex)包含相关货币的外汇汇率。

所以目标很简单:将df每行中的所有数据乘以forex的值,该值对应ccy中该行索引的df条目1}}。

机械设置如下所示:

import pandas as pd
import numpy as np
import itertools

np.random.seed(0)

tuples = list(itertools.product(
                                list('abd'), 
                                ['one', 'two', 'three'], 
                                ['USD', 'EUR', 'GBP']
                                ))

np.random.shuffle(tuples)

idx = pd.MultiIndex.from_tuples(tuples[:-10], names=['letter', 'number', 'ccy'])

df = pd.DataFrame(np.random.randn(len(idx), 3), index=idx,
                  columns=['val_1', 'val_2', 'val_3'])

forex = pd.Series({'USD': 1.0,
                   'EUR': 1.3,
                   'GBP': 1.7})

我可以通过运行得到我需要的东西:

df.apply(lambda col: col.mul(forex, level='ccy'), axis=0)

但对我来说似乎很奇怪我需要在这么简单的情况下使用pd.DataFrame.apply。我希望以下语法(或非常类似的东西)能够工作:

df.mul(forex, level='ccy', axis=0)

但是这给了我:

ValueError: cannot reindex from a duplicate axis

显然,apply方法并非灾难。但似乎很奇怪,我无法在mul的所有列中找出直接执行此操作的语法。有没有更直接的方法来处理这个?如果没有,是否有一个直观的原因mul语法不应该以这种方式增强?

1 个答案:

答案 0 :(得分:3)

现在可以在master / 0.14中使用。请参阅问题:https://github.com/pydata/pandas/pull/6682

In [11]: df.mul(forex,level='ccy',axis=0)
Out[11]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    GBP -2.172854  2.443530 -0.132098
d      three  USD  1.089630  0.096543  1.418667
b      two    GBP  1.986064  1.610216  1.845328
       three  GBP  4.049782 -0.690240  0.452957
a      two    GBP -2.304713 -0.193974 -1.435192
b      one    GBP  1.199589 -0.677936 -1.406234
d      two    GBP -0.706766 -0.891671  1.382272
b      two    EUR -0.298026  2.810233 -1.244011
d      one    EUR  0.087504  0.268448 -0.593946
              GBP -1.801959  1.045427  2.430423
b      three  EUR -0.275538 -0.104438  0.527017
a      one    EUR  0.154189  1.630738  1.844833
b      one    EUR -0.967013 -3.272668 -1.959225
d      three  GBP  1.953429 -2.029083  1.939772
              EUR  1.962279  1.388108 -0.892566
a      three  GBP  0.025285 -0.638632 -0.064980
              USD  0.367974 -0.044724 -0.302375

[17 rows x 3 columns]

这是另一种方法(也需要master / 0.14)

In [127]: df = df.sortlevel()

In [128]: df
Out[128]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    EUR  0.118607  1.254414  1.419102
              GBP -1.278149  1.437371 -0.077705
       three  GBP  0.014873 -0.375666 -0.038224
              USD  0.367974 -0.044724 -0.302375
       two    GBP -1.355714 -0.114103 -0.844231
b      one    EUR -0.743856 -2.517437 -1.507096
              GBP  0.705641 -0.398786 -0.827197
       three  EUR -0.211952 -0.080337  0.405398
              GBP  2.382224 -0.406024  0.266445
       two    EUR -0.229251  2.161717 -0.956931
              GBP  1.168273  0.947186  1.085487
d      one    EUR  0.067311  0.206499 -0.456881
              GBP -1.059976  0.614957  1.429661
       three  EUR  1.509445  1.067775 -0.686589
              GBP  1.149076 -1.193578  1.141042
              USD  1.089630  0.096543  1.418667
       two    GBP -0.415745 -0.524512  0.813101

[17 rows x 3 columns]

idx = pd.IndexSlice

In [129]: pd.concat([ df.loc[idx[:,:,x],:]*v for x,v in forex.iteritems() ])
Out[129]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    EUR  0.154189  1.630738  1.844833
b      one    EUR -0.967013 -3.272668 -1.959225
       three  EUR -0.275538 -0.104438  0.527017
       two    EUR -0.298026  2.810233 -1.244011
d      one    EUR  0.087504  0.268448 -0.593946
       three  EUR  1.962279  1.388108 -0.892566
a      one    GBP -2.172854  2.443530 -0.132098
       three  GBP  0.025285 -0.638632 -0.064980
       two    GBP -2.304713 -0.193974 -1.435192
b      one    GBP  1.199589 -0.677936 -1.406234
       three  GBP  4.049782 -0.690240  0.452957
       two    GBP  1.986064  1.610216  1.845328
d      one    GBP -1.801959  1.045427  2.430423
       three  GBP  1.953429 -2.029083  1.939772
       two    GBP -0.706766 -0.891671  1.382272
a      three  USD  0.367974 -0.044724 -0.302375
d      three  USD  1.089630  0.096543  1.418667

[17 rows x 3 columns]

这是通过合并的另一种方式

In [36]: f = forex.to_frame('value')

In [37]: f.index.name =  'ccy'

In [38]: pd.merge(df.reset_index(),f.reset_index(),on='ccy')
Out[38]: 
   letter number  ccy     val_1     val_2     val_3  value
0       a    one  GBP -1.278149  1.437371 -0.077705    1.7
1       b    two  GBP  1.168273  0.947186  1.085487    1.7
2       b  three  GBP  2.382224 -0.406024  0.266445    1.7
3       a    two  GBP -1.355714 -0.114103 -0.844231    1.7
4       b    one  GBP  0.705641 -0.398786 -0.827197    1.7
5       d    two  GBP -0.415745 -0.524512  0.813101    1.7
6       d    one  GBP -1.059976  0.614957  1.429661    1.7
7       d  three  GBP  1.149076 -1.193578  1.141042    1.7
8       a  three  GBP  0.014873 -0.375666 -0.038224    1.7
9       d  three  USD  1.089630  0.096543  1.418667    1.0
10      a  three  USD  0.367974 -0.044724 -0.302375    1.0
11      b    two  EUR -0.229251  2.161717 -0.956931    1.3
12      d    one  EUR  0.067311  0.206499 -0.456881    1.3
13      b  three  EUR -0.211952 -0.080337  0.405398    1.3
14      a    one  EUR  0.118607  1.254414  1.419102    1.3
15      b    one  EUR -0.743856 -2.517437 -1.507096    1.3
16      d  three  EUR  1.509445  1.067775 -0.686589    1.3

[17 rows x 7 columns]