我有一个像这样的桌子:
df = pd.DataFrame({'customer_id':[1,1,1,1,1,1,2,2,2,2,2,2],
'account_id':[1,1,1,2,2,2,1,1,1,2,2,2],
'date':['01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019'],
'amount':[np.NAN,np.NAN,100, np.NAN,200,np.NAN, np.NAN,300,np.NAN, 400, np.NAN,np.NAN],
'transaction':[10,-20,30,10,-20,30,10,-20,30,10,-20,30]})
print(df.head(5))
customer_id account_id date amount transaction
0 1 1 01/01/2019 NaN 10
1 1 1 01/02/2019 NaN -20
2 1 1 01/03/2019 100.0 30
3 1 2 01/01/2019 NaN 10
4 1 2 01/02/2019 200.0 -20
金额是给定日期结束时的总金额,而交易是每日交易金额。 这里的问题是,并非每个帐户都有余额或交易。我需要找到一种只处理交易的方法
我想使用以下逻辑在fillna
中amount_x
:
对于每个Financial_account_id,如果amount_x
为NA
然后检查下一行的值是否为NA
。
例如,此处对于customer_id
1
和account_id
1
,在01/02/2019
上,金额应为01/03/2019's
{{1} },并且在100-30=70
上的金额应为01/01/2019
01/02/2019's
理想的输出应该是:
70-(-20)=90
答案 0 :(得分:0)
我不知道我的解决方案是否会有所帮助,因为这是蛮力的。但是,看看。
主要思想是根据customer_id
和account_id
的组合将数据帧拆分为较小的帧。之后,您可以在较小的数据框中填充值(通过上述算法进行填充)。最后,将它们合并为已填充。
# imports
import pandas as pd
import numpy as np
# make df, as you have written above
df = pd.DataFrame(
{'customer_id':[1,1,1,1,1,1,2,2,2,2,2,2],
'account_id':[1,1,1,2,2,2,1,1,1,2,2,2],
'date':['01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019',
'01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019'],
'amount':[np.NAN,np.NAN,100, np.NAN,200,np.NAN, np.NAN,300,np.NAN, 400, np.NAN,np.NAN],
'transaction':[10,-20,30,10,-20,30,10,-20,30,10,-20,30]})
# make a new identifier (combination of customer_id and acount_id)
def get_cid_aid_combination(row):
cid = row['customer_id']
aid = row['account_id']
return f'{cid}-{aid}'
df['cid_aid'] = df.apply(lambda row: get_cid_aid_combination(row), axis=1)
# fill it up
list_with_dfs = []
for cid_aid in df.cid_aid.unique():
df_part = df[df['cid_aid']==cid_aid]
cnt = 0
while cnt < len(df_part):
for i, amount, trans in zip(df_part.index, df_part.amount, df_part.transaction):
if pd.isnull(amount) and i+1 in df_part.index:
if pd.notnull(df_part.loc[i+1, 'amount']):
df_part.loc[i, 'amount'] = df_part.loc[i+1, 'amount'] - df_part.loc[i+1, 'transaction']
if pd.isnull(amount) and i-1 in df_part.index:
if pd.notnull(df_part.loc[i-1, 'amount']):
df_part.loc[i, 'amount'] = df_part.loc[i-1, 'amount'] + df_part.loc[i, 'transaction']
cnt += 1
list_with_dfs.append(df_part)
# make a df with filled amount feature
df = pd.concat(list_with_dfs)