使用Pandas Python操纵数据框架

时间:2016-07-18 20:17:33

标签: python pandas dataframe

好的,所以我经历了很长一段时间,让我的数据框看起来像这样,以便我能够绘制图形:

data_mth  GROUP
201504    499 and below    0.001806
201505    499 and below    0.007375
201506    499 and below    0.000509
201507    499 and below    0.007344
201504    500 - 599        0.016672
201505    500 - 599        0.011473
201506    500 - 599        0.017733
201507    500 - 599        0.017651
201504    800 - 899        0.472784
201505    800 - 899        0.516837
201506    800 - 899        0.169811
201507    800 - 899        0.293966
201504    900 and above    0.065144
201505    900 and above    0.226626
201506    900 and above    0.251585
201507    900 and above    0.299850

由于这种方式占用了多少空间,我不得不修改我的代码,现在我有了这个数据框:

ptnr_cur_vntg_scor_band  499 and below  500 - 599  800 - 899  900 and above
data_mth
201504                   0.001806       0.016672   0.472784   0.065144
201505                   0.007375       0.011473   0.516837   0.226626
201506                   0.000509       0.017733   0.169811   0.251585
201507                   0.007344       0.017651   0.293966   0.299850

操纵第二个数据框看起来像第一个数据框的好方法是什么?

我目前的代码如下:

df = self.bunch['occ_data.all_data']
df = cpr.filter(df, 'ccm_acct_status', 'Open', 'Open-Inactive', 'Open-Active', 'OpenFraud', 'New')
df = df.groupby(['ptnr_cur_vntg_scor_band', 'data_mth']).sum()['ccm_curr_vntg_cnt']

df = df.unstack(0).fillna(0)

df.loc[:,"499andbelow":"NoVantageScore"] = df.loc[:, "499andbelow":"NoVantageScore"].div(df.sum(axis=1), axis=0)
df = df.fillna(0)

它的输出是上面的第二个数据帧。

1 个答案:

答案 0 :(得分:4)

import io
import pandas as pd

data = io.StringIO('''\
499 and below,500 - 599,800 - 899,900 and above
201504,0.001806,0.016672,0.472784,0.065144
201505,0.007375,0.011473,0.516837,0.226626
201506,0.000509,0.017733,0.169811,0.251585
201507,0.007344,0.017651,0.293966,0.299850
''')

df = pd.read_csv(data)
df.index.name = 'data_mth'
df.columns.name = 'ptnr_cur_vntg_scor_band'
print(df)

# ptnr_cur_vntg_scor_band  499 and below  500 - 599  800 - 899  900 and above
# data_mth                                                                   
# 201504                        0.001806   0.016672   0.472784       0.065144
# 201505                        0.007375   0.011473   0.516837       0.226626
# 201506                        0.000509   0.017733   0.169811       0.251585
# 201507                        0.007344   0.017651   0.293966       0.299850

s = df.unstack().swaplevel()
s.index.names = 'data_mth', 'GROUP'
print(s)

输出:

data_mth  GROUP   
201504    499 and below    0.001806
201505    499 and below    0.007375
201506    499 and below    0.000509
201507    499 and below    0.007344
201504    500 - 599        0.016672
201505    500 - 599        0.011473
201506    500 - 599        0.017733
201507    500 - 599        0.017651
201504    800 - 899        0.472784
201505    800 - 899        0.516837
201506    800 - 899        0.169811
201507    800 - 899        0.293966
201504    900 and above    0.065144
201505    900 and above    0.226626
201506    900 and above    0.251585
201507    900 and above    0.299850
dtype: float64