如何用python其他行中的值填充NA

时间:2019-05-30 04:06:25

标签: python python-3.x pandas loops jupyter-notebook

我有一个像这样的桌子:

 df =   pd.DataFrame({'customer_id':[1,1,1,1,1,1,2,2,2,2,2,2],
                   'account_id':[1,1,1,2,2,2,1,1,1,2,2,2],
                   'date':['01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019'],
                   'amount':[np.NAN,np.NAN,100, np.NAN,200,np.NAN, np.NAN,300,np.NAN, 400, np.NAN,np.NAN],
                   'transaction':[10,-20,30,10,-20,30,10,-20,30,10,-20,30]})
    print(df.head(5))
        customer_id account_id  date    amount  transaction
    0   1   1   01/01/2019  NaN 10
    1   1   1   01/02/2019  NaN -20
    2   1   1   01/03/2019  100.0   30
    3   1   2   01/01/2019  NaN 10
    4   1   2   01/02/2019  200.0   -20

金额是给定日期结束时的总金额,而交易是每日交易金额。 这里的问题是,并非每个帐户都有余额或交易。我需要找到一种只处理交易的方法

我想使用以下逻辑在fillnaamount_x

对于每个Financial_account_id,如果amount_xNA

然后检查下一行的值是否为NA

例如,此处对于customer_id 1account_id 1,在01/02/2019上,金额应为01/03/2019's {{1} },并且在100-30=70上的金额应为01/01/2019 01/02/2019's 理想的输出应该是:

70-(-20)=90

1 个答案:

答案 0 :(得分:0)

我不知道我的解决方案是否会有所帮助,因为这是蛮力的。但是,看看。

主要思想是根据customer_idaccount_id的组合将数据帧拆分为较小的帧。之后,您可以在较小的数据框中填充值(通过上述算法进行填充)。最后,将它们合并为已填充。

# imports
import pandas as pd
import numpy as np

# make df, as you have written above
df = pd.DataFrame(
    {'customer_id':[1,1,1,1,1,1,2,2,2,2,2,2],
     'account_id':[1,1,1,2,2,2,1,1,1,2,2,2],
     'date':['01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019',
             '01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019'],
     'amount':[np.NAN,np.NAN,100, np.NAN,200,np.NAN, np.NAN,300,np.NAN, 400, np.NAN,np.NAN],
     'transaction':[10,-20,30,10,-20,30,10,-20,30,10,-20,30]})

# make a new identifier (combination of customer_id and acount_id)
def get_cid_aid_combination(row):
    cid = row['customer_id']
    aid = row['account_id']
    return f'{cid}-{aid}'

df['cid_aid'] = df.apply(lambda row: get_cid_aid_combination(row), axis=1)

# fill it up
list_with_dfs = []

for cid_aid in df.cid_aid.unique():
    df_part = df[df['cid_aid']==cid_aid]
    cnt = 0
    while cnt < len(df_part):
        for i, amount, trans in zip(df_part.index, df_part.amount, df_part.transaction):
            if pd.isnull(amount) and i+1 in df_part.index:
                if pd.notnull(df_part.loc[i+1, 'amount']):
                    df_part.loc[i, 'amount'] = df_part.loc[i+1, 'amount'] - df_part.loc[i+1, 'transaction']
            if pd.isnull(amount) and i-1 in df_part.index:
                if pd.notnull(df_part.loc[i-1, 'amount']):
                    df_part.loc[i, 'amount'] = df_part.loc[i-1, 'amount'] + df_part.loc[i, 'transaction']

        cnt += 1
    list_with_dfs.append(df_part)

# make a df with filled amount feature
df = pd.concat(list_with_dfs)