标准化一个值,迭代一个groupby对象

时间:2018-02-20 20:58:10

标签: python iteration pandas-groupby

我需要一些帮助迭代python中的groupby对象。我将人们嵌套在一个ID变量下,然后在每个变量下,他们有3至6个月的余额。因此,打印groupby对象看起来像这样:

(1,    Primary BP     Product Rpt Month Closing Balance
0      1  CHECK    201708          10.04
1      1  CHECK    201709           11.1
2      1  CHECK    201710          11.16
3      1  CHECK    201711          11.22
4      1  CHECK    201712          11.28
5      1  CHECK    201801          11.34)
(2,      Primary BP     Product Rpt Month Closing Balance
79       2  CHECK    201711        52.42
85       2  CHECK    201712        31.56
136      2  CHECK    201801          99.91)

我想创建另一个列,根据第一笔金额标准化期末余额。所以理想的输出将如下所示:

(1,    Primary BP     Product Rpt Month Closing Balance standardized
0      1  CHECK    201708          10.04    0
1      1  CHECK    201709           11.1    1.1
2      1  CHECK    201710          11.16    1.16
3      1  CHECK    201711          11.22    1.22
4      1  CHECK    201712          11.28    1.28
5      1  CHECK    201801          11.34    1.34)
(2,      Primary BP     Product Rpt Month Closing Balance standardized
79       2  CHECK    201711        52.42      0
85       2  CHECK    201712        31.56    -20.86
136      2  CHECK    201801          99.91   47.79)

我只是无法弄清楚如何制作一个好的for循环,或者如果有任何其他方法,它将在groupby对象的组内迭代,取第一个值来结束余额并从每个中减去它结束平衡基本上可以创造差异分数。

1 个答案:

答案 0 :(得分:0)

我解决了!仅两周后。没有使用groupby对象。方法如下:

bpid = []
diffs = []

# These two lines were just a bit of cleaning needed to make the vals numeric
data['Closing Balance'] = data['Closing Balance'].str.replace(",", "")
data['Closing Balance'] = pd.to_numeric(data['Closing Balance'])

# Create a new variable in monthly_data that simply shows the increase in closing balance for each month,
# setting the first month to 0
for index, row in data.iterrows():
    bp = row[0]
    if bp not in bpid:
        bpid.append(bp)
        first = row[3]
    bal = row[3]
    diff = round(bal-first, 2)
    diffs.append(diff)
    row['balance increase'] = diff

# Just checking to make sure there are the right number of values. Same as data, so good to go
print(len(diffs))

# Convert my list of differences in closing balance to a series object, and merge with the monthly_data
se = pd.Series(diffs)
data['balance increase'] = se.values