Question

我有一个具有以下格式的数据框：

Item    Balance    Date
 1       200000    1/1/2020
 1       155000    2/1/2020
 1       100000    3/1/2020
 1        25000    4/1/2020
 1            0    5/1/2020 
 2       100000    1/1/2020 
 2        15000    2/1/2020
 2            0    3/1/2020

我想将数据框更改为以下格式：

Item   Cycle
 1     4;2#01/01/2020;1000#02/01/2020;775#03/01/2020;500#04/01/2020;125#05/01/2020;0
 2     2;2#01/01/2020;1000#02/01/2020;150#03/01/2020;0

cycle列将采用每个项目的非零值计数形式（“ Balance”字段）（项目1有4个，项目2有2个），后跟一个；常量2，后跟＃日期列表中日期的初始缩放值是1000。然后＃+（下一个日期值）+（项目的当前余额/项目的初始余额）*初始缩放的余额（1000）当项目余额为0时，直到项目观察达到余额0。循环变量将以＃（日期列中的日期）关闭； 0。另请注意，周期变量中的日期将采用mm / dd / yyyy的形式。

预先感谢您的帮助。

Answer 1

假设您的Date列已转换为datetime64：

def summarize(group):
    # The number of line items where Balance > 0
    count = (group['Balance'] > 0).sum()

    # Scale the data where the initial balance = 1000
    scaled = pd.DataFrame({
        'Balance': group['Balance'] / group['Balance'].iloc[0] * 1000,
        'Date': group['Date'].dt.strftime('%m/%d/%Y')
    })

    # The lambda to produce the string 01/01/2020;1000
    f = lambda row: f'{row["Date"]};{row["Balance"]:.0f}'

    # Join the balances togather
    data = '#'.join(scaled.apply(f, axis=1))

    # The final string for each group
    return f'{count};2#{data}'

df.groupby('Item').apply(summarize)

根据另一列中的值将列值连接到新列中

1 个答案: