Question

我有一个像这样的桌子：

 df =   pd.DataFrame({'customer_id':[1,1,1,1,1,1,2,2,2,2,2,2],
                   'account_id':[1,1,1,2,2,2,1,1,1,2,2,2],
                   'date':['01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019'],
                   'amount':[np.NAN,np.NAN,100, np.NAN,200,np.NAN, np.NAN,300,np.NAN, 400, np.NAN,np.NAN],
                   'transaction':[10,-20,30,10,-20,30,10,-20,30,10,-20,30]})
    print(df.head(5))
        customer_id account_id  date    amount  transaction
    0   1   1   01/01/2019  NaN 10
    1   1   1   01/02/2019  NaN -20
    2   1   1   01/03/2019  100.0   30
    3   1   2   01/01/2019  NaN 10
    4   1   2   01/02/2019  200.0   -20

金额是给定日期结束时的总金额，而交易是每日交易金额。这里的问题是，并非每个帐户都有余额或交易。我需要找到一种只处理交易的方法

我想使用以下逻辑在fillna中amount_x：

对于每个Financial_account_id，如果amount_x为NA

然后检查下一行的值是否为NA。

例如，此处对于customer_id 1和account_id 1，在01/02/2019上，金额应为01/03/2019's {{1} }，并且在100-30=70上的金额应为01/01/2019 01/02/2019's 理想的输出应该是：

70-(-20)=90

Answer 1

我不知道我的解决方案是否会有所帮助，因为这是蛮力的。但是，看看。

主要思想是根据customer_id和account_id的组合将数据帧拆分为较小的帧。之后，您可以在较小的数据框中填充值（通过上述算法进行填充）。最后，将它们合并为已填充。

# imports
import pandas as pd
import numpy as np

# make df, as you have written above
df = pd.DataFrame(
    {'customer_id':[1,1,1,1,1,1,2,2,2,2,2,2],
     'account_id':[1,1,1,2,2,2,1,1,1,2,2,2],
     'date':['01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019',
             '01/01/2019','01/02/2019','01/03/2019','01/01/2019','01/02/2019','01/03/2019'],
     'amount':[np.NAN,np.NAN,100, np.NAN,200,np.NAN, np.NAN,300,np.NAN, 400, np.NAN,np.NAN],
     'transaction':[10,-20,30,10,-20,30,10,-20,30,10,-20,30]})

# make a new identifier (combination of customer_id and acount_id)
def get_cid_aid_combination(row):
    cid = row['customer_id']
    aid = row['account_id']
    return f'{cid}-{aid}'

df['cid_aid'] = df.apply(lambda row: get_cid_aid_combination(row), axis=1)

# fill it up
list_with_dfs = []

for cid_aid in df.cid_aid.unique():
    df_part = df[df['cid_aid']==cid_aid]
    cnt = 0
    while cnt < len(df_part):
        for i, amount, trans in zip(df_part.index, df_part.amount, df_part.transaction):
            if pd.isnull(amount) and i+1 in df_part.index:
                if pd.notnull(df_part.loc[i+1, 'amount']):
                    df_part.loc[i, 'amount'] = df_part.loc[i+1, 'amount'] - df_part.loc[i+1, 'transaction']
            if pd.isnull(amount) and i-1 in df_part.index:
                if pd.notnull(df_part.loc[i-1, 'amount']):
                    df_part.loc[i, 'amount'] = df_part.loc[i-1, 'amount'] + df_part.loc[i, 'transaction']

        cnt += 1
    list_with_dfs.append(df_part)

# make a df with filled amount feature
df = pd.concat(list_with_dfs)

如何用python其他行中的值填充NA

1 个答案: