使用python pandas对值量进行运算

时间:2018-06-29 10:43:21

标签: python pandas

我有2个数据框“交易”和“偏移”

偏移量:

    Contact Account Name    
0   TODD HOWARD 
1   TODD HOWARD 
2   JEFF COX
3   JEFF COX    
4   TODD HOWARD 
5   JEFF COX    
6   MIKE BALDWIN    

交易:

    Contact Account Name    
0   TODD HOWARD 
1   TODD HOWARD     
2   JEFF COX    
3   JEFF COX    
4   TODD HOWARD     
5   JEFF COX    
6   TODD HOWARD     
7   MIKE BALDWIN    
8   MIKE BALDWIN
9   JEFF COX    
10  JC WHITE    

它想做什么: 1)是计算每个唯一值。为此,我使用了:

df1 = offsets.groupby('Contact Account Name').size()
df2 = transactions.groupby('Contact Account Name').size()

我有

df1:

Contact Account Name
TODD HOWARD               3
JEFF COX                  3
MIKE BALDWIN              1

df2:

Contact Account Name
JC WHITE                  1
TODD HOWARD               4
JEFF COX                  4
MIKE BALDWIN              2

2)我想合并两个数据框。我尝试过merge,但是没有用。

3)我想创建另一个数据框并计算总交易中的偏移量百分比。

我想在最后看到什么结果?

Contact Account Name      Offset Percentage
TODD HOWARD               75
JEFF COX                  75
MIKE BALDWIN              50
JC WHITE                  100

谢谢!

1 个答案:

答案 0 :(得分:1)

聚合的输出为Series,因此可以除以div,再除以mul,最后除reset_index

df = df1.div(df2, fill_value=1).mul(100).reset_index(name='Offset Percentage')
print (df)
  Contact Account Name  Offset Percentage
0             JC WHITE              100.0
1             JEFF COX               75.0
2         MIKE BALDWIN               50.0
3          TODD HOWARD               75.0

value_counts相似的解决方案:

df1 = offsets['Contact Account Name'].value_counts()
df2 = transactions['Contact Account Name'].value_counts()

df = (df1.div(df2, fill_value=1)
         .mul(100)
         .rename_axis('Contact Account Name')
         .reset_index(name='Offset Percentage'))
print (df)
  Contact Account Name  Offset Percentage
0             JC WHITE              100.0
1             JEFF COX               75.0
2         MIKE BALDWIN               50.0
3          TODD HOWARD               75.0

如果需要将两个系列一起加入,请致电concat

df = pd.concat([df2, df1], axis=1, keys=('Offset Percentage','b'))
df['Offset Percentage'] = df.b.div(df['Offset Percentage'], fill_value=1).mul(100)
df = df.drop('b', 1).rename_axis('Contact Account Name').reset_index()
print (df)
  Contact Account Name  Offset Percentage
0             JC WHITE              100.0
1             JEFF COX               75.0
2         MIKE BALDWIN               50.0
3          TODD HOWARD               75.0