熊猫-创建具有不同值的列

时间:2020-09-03 16:22:15

标签: python pandas

我有以下数据集。如何创建一个新列来显示每个人每个到期的货币差额?

我要的是黄色列。您可以看到,这是该人每个到期点的差额。我用其他颜色突出显示了其他行,以便更清晰。

非常感谢。

Example

[enter image description here]

2 个答案:

答案 0 :(得分:1)

import pandas as pd
import numpy as np

example = pd.DataFrame( data = {'Day': ['2020-08-30', '2020-08-30','2020-08-30','2020-08-30',
                                        '2020-08-29', '2020-08-29','2020-08-29','2020-08-29'],
                                'Name': ['John', 'Mike', 'John', 'Mike','John', 'Mike', 'John', 'Mike'],
                                'Money': [100, 950, 200, 1000, 50, 50, 250, 1200],
                                'Expiry': ['1Y', '1Y', '2Y','2Y','1Y','1Y','2Y','2Y']})

example_0830 = example[ example['Day']=='2020-08-30' ].reset_index()
example_0829 = example[ example['Day']=='2020-08-29' ].reset_index()

example_0830['key'] = example_0830['Name'] + example_0830['Expiry']
example_0829['key'] = example_0829['Name'] + example_0829['Expiry']
example_0829 = pd.DataFrame( example_0829, columns = ['key','Money'])

example_0830 = pd.merge(example_0830, example_0829, on = 'key')
example_0830['Difference'] = example_0830['Money_x'] - example_0830['Money_y']
example_0830 = example_0830.drop(columns=['key', 'Money_y','index'])

结果:

          Day  Name  Money_x Expiry  Difference
0  2020-08-30  John      100     1Y          50
1  2020-08-30  Mike      950     1Y         900
2  2020-08-30  John      200     2Y         -50
3  2020-08-30  Mike     1000     2Y        -200

如果差异只是从前一个日期得出的,则只需在开始时定义日期变量以查找今天(t)和前一天(t-1)即可过滤出原始数据框。

答案 1 :(得分:0)

您可以使用groupby.diff

解决它

获取数据框

df = pd.DataFrame({
    'Day': [30, 30, 30, 30, 29, 29, 28, 28],
    'Name': ['John', 'Mike', 'John', 'Mike', 'John', 'Mike', 'John', 'Mike'],
    'Money': [100, 950, 200, 1000, 50, 50, 250, 1200],
    'Expiry': [1, 1, 2, 2, 1, 1, 2, 2]
})
print(df)

看起来像

   Day  Name  Money  Expiry
0   30  John    100       1
1   30  Mike    950       1
2   30  John    200       2
3   30  Mike   1000       2
4   29  John     50       1
5   29  Mike     50       1
6   28  John    250       2
7   28  Mike   1200       2

和代码

# make sure we have dates in the order we want
df.sort_values('Day', ascending=False)

# groubpy and get the difference from the next row in each group
# diff(1) calculates the difference from the previous row, so -1 will point to the next
df['Difference'] = df.groupby(['Name', 'Expiry']).Money.diff(-1)

输出

   Day  Name  Money  Expiry  Difference
0   30  John    100       1        50.0
1   30  Mike    950       1       900.0
2   30  John    200       2       -50.0
3   30  Mike   1000       2      -200.0
4   29  John     50       1         NaN
5   29  Mike     50       1         NaN
6   28  John    250       2         NaN
7   28  Mike   1200       2         NaN