使用熊猫在数据框上进行按列映射和操作

时间:2019-11-04 11:15:46

标签: python pandas

我有两个数据帧,分别是df1df2。我想对New_Amount_Dollar中的列df2执行操作。基本上在df1中,我有历史货币数据,并且我想对Currency中的Amount_Dollardf2进行按日期的操作,以计算{中的New_Amount_Dollar列的值{1}}。

例如,在df2中,df2的第一个货币为AUD,因此我要计算Date = '01-01-2019'的值,使

New_Amount_Dollar = New_Amount_Dollar中的值Amount_Dollar/AUDdf1

另一个示例,其中在New_Amount_Dollar = 19298/98 = 196.91中,df2的第三种货币为COP,因此我要计算Date = '03-01-2019的值,使得

来自New_Amount_Dollar

New_Amount_Dollar = Amount_Dollar/COP值 即df1

New_Amount_Dollar = 5000/0.043 = 116279.06
import pandas as pd
data1 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','05-01-2019'],
        'AUD':[98, 98.5, 99, 99.5, 97],
        'BWP':[30,31,33,32,31],
        'CAD':[0.02,0.0192,0.0196,0.0196,0.0192],
        'BND':[0.99,0.952,0.970,0.980,0.970],
        'COP':[0.05,0.047,0.043,0.047,0.045]}
df1 = pd.DataFrame(data1)

data2 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','05-01-2019'],
        'Currency':['AUD','AUD','COP','CAD','BND'],
        'Amount_Dollar':[19298, 19210, 5000, 200, 2300],
        'New_Amount_Dollar':[0,0,0,0,0]
        }
df2 = pd.DataFrame(data2) 

df1

         Date   AUD  BWP     CAD    BND    COP
0  01-01-2019  98.0   30  0.0200  0.990  0.050
1  02-01-2019  98.5   31  0.0192  0.952  0.047
2  03-01-2019  99.0   33  0.0196  0.970  0.043
3  04-01-2019  99.5   32  0.0196  0.980  0.047
4  05-01-2019  97.0   31  0.0192  0.970  0.045

预期结果

df2

         Date Currency  Amount_Dollar  New_Amount_Dollar
0  01-01-2019      AUD          19298                  0
1  02-01-2019      AUD          19210                  0
2  03-01-2019      COP           5000                  0
3  04-01-2019      CAD            200                  0
4  05-01-2019      BND           2300                  0

1 个答案:

答案 0 :(得分:3)

DataFrame.lookupDataFrame.set_index一起用于数组,并划分Amount_Dollar列:

arr = df1.set_index('Date').lookup(df2['Date'], df2['Currency'])

df2['New_Amount_Dollar'] = df2['Amount_Dollar'] /  arr
print (df2)
         Date Currency  Amount_Dollar  New_Amount_Dollar
0  01-01-2019      AUD          19298         196.918367
1  02-01-2019      AUD          19210         195.025381
2  03-01-2019      COP           5000      116279.069767
3  04-01-2019      CAD            200       10204.081633
4  05-01-2019      BND           2300        2371.134021

但是如果日期时间不匹配,请使用DataFrame.asfreq

import pandas as pd
data1 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', 
                 '04-01-2019','05-01-2019','08-01-2019'],
        'AUD':[98, 98.5, 99, 99.5, 97,100],
        'BWP':[30,31,33,32,31,20],
        'CAD':[0.02,0.0192,0.0196,0.0196,0.0192,0.2],
        'BND':[0.99,0.952,0.970,0.980,0.970,.23],
        'COP':[0.05,0.047,0.043,0.047,0.045,0.023]}
df1 = pd.DataFrame(data1)

data2 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','07-01-2019'],
        'Currency':['AUD','AUD','COP','CAD','BND'],
        'Amount_Dollar':[19298, 19210, 5000, 200, 2300],
        'New_Amount_Dollar':[0,0,0,0,0]
        }
df2 = pd.DataFrame(data2) 
print (df1)
         Date    AUD  BWP     CAD    BND    COP
0  01-01-2019   98.0   30  0.0200  0.990  0.050
1  02-01-2019   98.5   31  0.0192  0.952  0.047
2  03-01-2019   99.0   33  0.0196  0.970  0.043
3  04-01-2019   99.5   32  0.0196  0.980  0.047
4  05-01-2019   97.0   31  0.0192  0.970  0.045
5  08-01-2019  100.0   20  0.2000  0.230  0.023

print (df2)
         Date Currency  Amount_Dollar  New_Amount_Dollar
0  01-01-2019      AUD          19298                  0
1  02-01-2019      AUD          19210                  0
2  03-01-2019      COP           5000                  0
3  04-01-2019      CAD            200                  0
4  07-01-2019      BND           2300                  0

df1['Date'] = pd.to_datetime(df1['Date'], dayfirst=True)
df2['Date'] = pd.to_datetime(df2['Date'], dayfirst=True)

print (df1.set_index('Date').asfreq('D', method='ffill'))
              AUD  BWP     CAD    BND    COP
Date                                        
2019-01-01   98.0   30  0.0200  0.990  0.050
2019-01-02   98.5   31  0.0192  0.952  0.047
2019-01-03   99.0   33  0.0196  0.970  0.043
2019-01-04   99.5   32  0.0196  0.980  0.047
2019-01-05   97.0   31  0.0192  0.970  0.045
2019-01-06   97.0   31  0.0192  0.970  0.045
2019-01-07   97.0   31  0.0192  0.970  0.045
2019-01-08  100.0   20  0.2000  0.230  0.023

arr = df1.set_index('Date').asfreq('D', method='ffill').lookup(df2['Date'], df2['Currency'])

df2['New_Amount_Dollar'] = df2['Amount_Dollar'] /  arr
print (df2)
        Date Currency  Amount_Dollar  New_Amount_Dollar
0 2019-01-01      AUD          19298         196.918367
1 2019-01-02      AUD          19210         195.025381
2 2019-01-03      COP           5000      116279.069767
3 2019-01-04      CAD            200       10204.081633
4 2019-01-07      BND           2300        2371.134021