映射pandas数据框的多个列并对其执行操作

时间:2019-11-03 17:04:03

标签: python pandas

我有两个数据帧,即df和df1。我对数据框df的货币转换感兴趣。 在df数据框中,我们有6列。第一列是日期,其余是各个日期的货币值。我想将这些货币转换为正确的格式。在数据框df1中,我有2列,第一列是货币,第二列是运算符。

我有兴趣将相应的运算符应用于df的货币值。 例如在df中,我们有第二列即“ AUD”,我想将所有“ AUD”值转换为正确的格式 表示乘以或除以数据帧df1中相应的“运算符”列。 在这里,“ AUD”具有“ multiply”运算符,因此所有值都乘以1。对于“ CAD”,应除以“ CAD”列中的1 /“ CAD”值。

import pandas as pd    
data = {'Date':['01-01-2019', '01-01-2019', '01-01-2019', '01-01-2019','01-01-2019'],
        'AUD':[98, 98.5, 99, 99.5, 97],
        'BWP':[30,31,33,32,31],
        'CAD':[50,52,51,51,52],
        'BND':[1.01,1.05,1.03,1.02,1.03],
        'COP':[20,21,23,21,22]}    
df = pd.DataFrame(data)

data1 = {'currency':['DZD', 'AUD', 'CNY', 'BND','BRL','BWP','CAD','COP'],
        'operator':['divide', 'multiply', 'divide', 'divide','divide','multiply','divide','divide'],
        }    
df1 = pd.DataFrame(data1)
df

         Date   AUD  BWP  CAD   BND  COP
0  01-01-2019  98.0   30   50  1.01   20
1  02-01-2019  98.5   31   52  1.05   21
2  03-01-2019  99.0   33   51  1.03   23
3  04-01-2019  99.5   32   51  1.02   21
4  05-01-2019  97.0   31   52  1.03   22

df1

  currency code  operator
0           DZD    divide
1           AUD  multiply
2           CNY    divide
3           BND    divide
4           BRL    divide
5           BWP  multiply
6           CAD    divide
7           COP    divide

预期输出:

         Date   AUD  BWP     CAD    BND     COP
0  01-01-2019  98.0   30  0.0200  0.990   0.050
1  02-01-2019  98.5   31  0.0192  0.952   0.047
2  03-01-2019  99.0   33  0.0196  0.970   0.043
3  04-01-2019  99.5   32  0.0196  0.980  20.047
4  05-01-2019  97.0   31  0.0192  0.970   0.045

4 个答案:

答案 0 :(得分:0)

您可以使用:

n=1
#Date set like index because you should not perform operations on this column
df=df.set_index('Date')
#Selecting columns where divide is necessary
div_code=df1.loc[df1['operator']=='divide','code']

#Creating a boolean indexing of columns
col_mask=df.columns.isin(div_code)

#Applying operations to data frame columns
df[df.columns[col_mask]]=n/df[df.columns[col_mask]]
df[df.columns[~col_mask]]=n*df[df.columns[~col_mask]]

#putting Date as a column again
df.reset_index(inplace=True)
print(df)

         Date   AUD  BWP       CAD       BND       COP
0  01-01-2019  98.0   30  0.020000  0.990099  0.050000
1  02-01-2019  98.5   31  0.019231  0.952381  0.047619
2  03-01-2019  99.0   33  0.019608  0.970874  0.043478
3  04-01-2019  99.5   32  0.019608  0.980392  0.047619
4  05-01-2019  97.0   31  0.019231  0.970874  0.045455

答案 1 :(得分:0)

如果将数据作为字典存储在df1中会更容易:

operators = df1.set_index('currency')['operator'].to_dict()
df.apply(lambda col: col if operators.get(col.name, 'multiply') == 'multiply' else 1 / col)

答案 2 :(得分:0)

请找到产生预期输出的代码,

import pandas as pd    
pd.set_option('display.max_colwidth', 100)
data = {'Date':['01-01-2019', '01-01-2019', '01-01-2019', '01-01-2019','01-01-2019'],
    'AUD':[98, 98.5, 99, 99.5, 97],
    'BWP':[30,31,33,32,31],
    'CAD':[50.00,52.00,51.00,51.00,52.00],
    'BND':[1.01,1.05,1.03,1.02,1.03],
    'COP':[20.00,21.00,23.00,21.00,22.00]}    
df = pd.DataFrame(data)

data1 = {'currency':['DZD', 'AUD', 'CNY', 'BND','BRL','BWP','CAD','COP'],
    'operator':['divide', 'multiply', 'divide', 'divide','divide','multiply','divide','divide'],
    }    
df1 = pd.DataFrame(data1)

for dfcurrency in df.columns:
    for df1currency in df1['currency']: 
        if(dfcurrency == df1currency):   
            operator = df1[df1['currency'] == df1currency]['operator']

            for j in (operator):
                if(j == 'multiply'):
                    for k in range(0,df.shape[0]):
                        df[df1currency][k] = df[df1currency][k] *1
                elif(j == 'divide'):
                    for l in range(0,df.shape[0]):
                        df[df1currency][l] = round(1/df[df1currency][l],4)
print(df)


     Date   AUD  BWP     CAD     BND     COP
  0  01-01-2019  98.0   30  0.0200  0.9901  0.0500
  1  01-01-2019  98.5   31  0.0192  0.9524  0.0476
  2  01-01-2019  99.0   33  0.0196  0.9709  0.0435
  3  01-01-2019  99.5   32  0.0196  0.9804  0.0476
  4  01-01-2019  97.0   31  0.0192  0.9709  0.0455

答案 3 :(得分:0)

您可以使用operator创建字典以将文本“乘”和“除”替换为运算符:

import operator as op

operators = { "multiply": op.mul, "divide": op.itruediv }

仅获取我们想要将其映射到的列:

new_op = df1.iloc[1:,1]
new_set = new_op.map(ops)
new_set =pd.Series(new_set) 
new_set.index -= 1 #for some reason I had to reset the index

以及您列表中的一组新运算符

new_set

0         <built-in function mul>
1    <built-in function itruediv>
2    <built-in function itruediv>
3    <built-in function itruediv>
4         <built-in function mul>
5    <built-in function itruediv>
6    <built-in function itruediv>
Name: operator, dtype: object

因此,要将转换后的文本作为运算符应用于您的数据,以下是“ AUD”列的示例:

for i in range(0, len(df)):
df.loc[i,'AUD'] = new_set[i](1,df.loc[i,'AUD'])

会产量

        Date    AUD     BWP     CAD     BND     COP
0   01-01-2019  98.000000   30  50  1.01    20
1   01-01-2019  0.010152    31  52  1.05    21
2   01-01-2019  0.010101    33  51  1.03    23
3   01-01-2019  0.010050    32  51  1.02    21
4   01-01-2019  97.000000   31  52  1.03    22

您应该能够将其概括到所有列,或者为每个国家/地区代码添加新行,例如

for i in range(0, len(df)):
df.loc[i,'AUD'] = new_set[i](1,df.loc[i,'AUD'])
df.loc[i,'BWP'] = new_set[i](1,df.loc[i,'BWP'])
....