如何根据条件和其他df中的值将值分配给pandas df中的新列?

时间:2020-10-27 07:08:20

标签: python-3.x pandas numpy

我有一个数据框:

recurring_credits = 
    narration   values   month_year
4   bupi upi    True    Sept-2019
5   bupi upi    False   Oct-2019
9   bupi upi    False   December-2019
11  merv visi   True    December-2019
12  neft pad    True    December-2019
17  bupi upi    False   December-2019
22  bupi upi    False   Oct-2019
27  bupi upi    False   December-2019
31  bupi upi    False   December-2019
32  bupi upi    True    Sept-2019
36  neft pad    True    Sept-2019
40  bupi upi    False   December-2019
44  bupi upi    False   December-2019
48  bupi upi    False   December-2019
49  bupi upi    False   December-2019
51  bupi upi    False   December-2019
53  imps bok    True    December-2019
58  imps bok    True    December-2019
60  bupi upi    False   December-2019
67  neft pad    True    January-2020

我必须为唯一的叙述创建每月交易的列数,并且只有真实值,否则为零。 我的输出df应该如下所示

df_out = 
    narration   values   month_year    tran/month
4   bupi upi    True    Sept-2019          2
5   bupi upi    False   Oct-2019           0
9   bupi upi    False   December-2019      0
11  merv visi   True    December-2019      1
12  neft pad    True    December-2019      1
17  bupi upi    False   December-2019      0
22  bupi upi    False   Oct-2019           0
27  bupi upi    False   December-2019      0
31  bupi upi    False   December-2019      0
32  bupi upi    True    Sept-2019          2
36  neft pad    True    Sept-2019          1
40  bupi upi    False   December-2019      0 
44  bupi upi    False   December-2019      0  
48  bupi upi    False   December-2019      0
49  bupi upi    False   December-2019      0
51  bupi upi    False   December-2019      0
53  imps bok    True    December-2019      2
58  imps bok    True    December-2019      2
60  bupi upi    False   December-2019      0
67  neft pad    True    January-2020       1

我已经尝试过了,但是无法获得正确的输出:

unique_narration = list(recurring_credits['narration'].unique())
for narration in unique_narration:
    d = recurring_credits.loc[(recurring_credits['narration']==narration)&(recurring_credits['values']==True)]
    rec_pat = d.groupby('month_year', as_index=True).agg({'narration':'nunique'}).reset_index()
    rec_pat.columns = ['month_year','recurrance_number']
    recurring_credits['recurrance_pattern']=np.nan
    for i,j in zip(rec_pat.transaction_month_year,rec_pat.recurrance_number):
        recurring_credits['recurrance_pattern'].loc[(recurring_credits['narration']==narration)&(recurring_credits['month_year']==i)&(recurring_credits['values']==True)]=j

2 个答案:

答案 0 :(得分:2)

您需要用Series.wherenarration行的NaN替换为False,然后将GroupBy.transform用于新列,并用{{ 3}}来计算非缺失值:

s = (df.assign(new = df['narration'].where(df['values']))
       .groupby(['month_year','narration'])['new']
       .transform('count'))
df['tran/month'] = s
print (df)
    narration  values     month_year  tran/month
4    bupi upi    True      Sept-2019           2
5    bupi upi   False       Oct-2019           0
9    bupi upi   False  December-2019           0
11  merv visi    True  December-2019           1
12   neft pad    True  December-2019           1
17   bupi upi   False  December-2019           0
22   bupi upi   False       Oct-2019           0
27   bupi upi   False  December-2019           0
31   bupi upi   False  December-2019           0
32   bupi upi    True      Sept-2019           2
36   neft pad    True      Sept-2019           1
40   bupi upi   False  December-2019           0
44   bupi upi   False  December-2019           0
48   bupi upi   False  December-2019           0
49   bupi upi   False  December-2019           0
51   bupi upi   False  December-2019           0
53   imps bok    True  December-2019           2
58   imps bok    True  December-2019           2
60   bupi upi   False  December-2019           0
67   neft pad    True   January-2020           1

答案 1 :(得分:1)

您可以将其分为几个步骤:

  1. 将值== True的数据框细分为
  2. groupby(“ month_year”)并获取唯一的“旁白”值的计数
  3. 进行上述计数并将其合并回原始数据框
  4. 将合并后的NaN填充为0
  5. 将dtype从float更改为新列的int

步骤1-2

usable_counts = (
    df.loc[df["values"]]
    .groupby(["month_year"])["narration"]
    .value_counts()
    .rename("tran/month")
)

print(usable_counts)
month_year     narration
December-2019  imps bok     2
               merv visi    1
               neft pad     1
January-2020   neft pad     1
Sept-2019      bupi upi     2
               neft pad     1
Name: tran/month, dtype: int64

步骤3-5

现在,我们有了每月/叙事的计数,我们可以将其合并回原始数据框并清理最终结果:

final_df = (
    df.merge(
        usable_counts,
        left_on=["month_year", "narration"],
        right_index=True,
        how="left")
    .fillna(0)
    .astype({"tran/month": int})
)

print(final_df)
    narration  values     month_year  tran/month
4    bupi upi    True      Sept-2019           2
5    bupi upi   False       Oct-2019           0
9    bupi upi   False  December-2019           0
11  merv visi    True  December-2019           1
12   neft pad    True  December-2019           1
17   bupi upi   False  December-2019           0
22   bupi upi   False       Oct-2019           0
27   bupi upi   False  December-2019           0
31   bupi upi   False  December-2019           0
32   bupi upi    True      Sept-2019           2
36   neft pad    True      Sept-2019           1
40   bupi upi   False  December-2019           0
44   bupi upi   False  December-2019           0
48   bupi upi   False  December-2019           0
49   bupi upi   False  December-2019           0
51   bupi upi   False  December-2019           0
53   imps bok    True  December-2019           2
58   imps bok    True  December-2019           2
60   bupi upi   False  December-2019           0
67   neft pad    True   January-2020           1