选择不同月份的同一笔交易

时间:2019-12-17 11:30:17

标签: python pandas dataframe

我想选择或保留使用Pandas数据框仅在至少3个月内发生的相同交易说明(TRNDESCR)。我尝试了一些代码,但无法正常工作。

下面是示例数据集:

    ACNO TIME                   TRNCD   TRNDESCR                                        TRNAMT
0   85   2018-12-19 20:40:00    109     Ib Transfer To Phoutthalom Syh Account No:123   -20000
1   85   2018-12-19 21:15:00    109     Ib Transfer To Phoutthalom Syh Account No:123   -25000
2   85   2018-12-20 15:30:00    109     Ib Transfer To Thongsavath Pra Account No:124   -10000
3   85   2018-12-22 12:30:00    209     Bil Payment                                     -500
4   85   2018-12-25 15:34:00    109     Ib Transfer To Phoutthalom Syh Account No:123   -60000
5   85   2019-01-22 12:30:00    209     Bil Payment                                     -501
6   85   2019-01-23 12:50:00    109     Ib Transfer To Sarah Account No:199             -3000
7   85   2019-01-31 08:59:00    109     Ib Transfer To Thongsavath Pra Account No:124   -650000
8   85   2019-02-02 12:30:00    109     Ib Transfer To Sarah Account No:199             -600
9   85   2019-02-03 15:02:00    109     Ib Transfer To Phoutthalom Syh Account No:123   -60000
10  85   2019-02-04 15:21:00    109     Ib Transfer To Thongsavath Pra Account No:124   -863000
11  85   2019-02-05 15:30:00    209     Bil Payment                                     -600

以下是预期结果:

    ACNO TIME                   TRNCD   TRNDESCR                                        TRNAMT
0   85   2018-12-20 15:30:00    109     Ib Transfer To Thongsavath Pra Account No:124   -10000
1   85   2018-12-22 12:30:00    209     Bil Payment                                     -500
2   85   2019-01-22 12:30:00    209     Bil Payment                                     -501
3   85   2019-01-31 08:59:00    109     Ib Transfer To Thongsavath Pra Account No:124   -650000
4   85   2019-02-04 15:21:00    109     Ib Transfer To Thongsavath Pra Account No:124   -863000
5   85   2019-02-05 15:30:00    209     Bil Payment                                     -600

2 个答案:

答案 0 :(得分:0)

选择要作为指标的列,就像您给出的示例一样,它是TRNDESCR,并且还希望将TIME放入TIME作为过滤器。然后,您可以根据TRNDESCR删除重复项并进行分组,然后根据月计数交易发生的时间。

示例:

import pandas as pd

df = pd.DataFrame()
df['TIME'] = ["2018-12-19", "2018-12-20", "2019-01-20", "2019-02-06",
             "2018-12-18", "2018-12-02", "2019-01-03", "2019-02-06"]
df['TRNDESCR'] = ["ib1", "ib2", "ib2", "ib2",
                 "ib2", "ib3", "ib3", "ib3"]
df['ACNO'] = 85


df['TIME'] = pd.to_datetime(df['TIME'])
df['MONTH'] = df['TIME'].dt.month

count_month = df[['MONTH', 'TRNDESCR']].drop_duplicates(['MONTH', 'TRNDESCR'], keep="last").groupby('TRNDESCR')['MONTH'].count()

df[df['TRNDESCR'].isin(count_month[count_month >= 3].index)]

TIME    TRNDESCR    ACNO    MONTH
1   2018-12-20  ib2     85  12
2   2019-01-20  ib2     85  1
3   2019-02-06  ib2     85  2
4   2018-12-18  ib2     85  12
5   2018-12-02  ib3     85  12
6   2019-01-03  ib3     85  1
7   2019-02-06  ib3     85  2

答案 1 :(得分:0)

这是我的解决方法


import pandas as pd

df = pd.read_excel("df_85.xlsx")

df_copy = df.copy()

# introduce new column
time = pd.DatetimeIndex(df_copy.TIME)
df_copy['yearmonth'] = time.year.astype(str) + time.month.astype(str)

# find month occurences within each TRNDESCR group
new_df = df_copy.groupby(['TRNDESCR']).yearmonth.nunique().to_frame().reset_index()
new_df = new_df[new_df.yearmonth >= 3]

# get row with TRNDESCR matches those in new_df
output_df = df[df.TRNDESCR.isin(new_df.TRNDESCR.values)]

print(output_df)

输出

    ACNO  YEAR  MONTH                TIME  TRNCD                                       TRNDESCR  TRNAMT
2     85  2018     12 2018-12-20 15:30:00    109  Ib Transfer To Thongsavath Pra Account No:124  -10000
3     85  2018     12 2018-12-22 12:30:00    209                                   Bil Payment     -500
5     85  2018      1 2019-01-22 12:30:00    209                                   Bil Payment     -501
7     85  2019      1 2019-01-31 08:59:00    109  Ib Transfer To Thongsavath Pra Account No:124 -650000
10    85  2019      2 2019-02-04 15:21:00    109  Ib Transfer To Thongsavath Pra Account No:124 -863000
11    85  2019      2 2019-02-05 15:30:00    209                                   Bil Payment     -600

通过创建新列“ yearmonth”(年份和月份的串联)来工作。然后对TRNDESCR进行分组,并计算每个组的唯一年份的月数。