熊猫-按2列分组后恢复索引列

时间:2018-09-17 08:46:31

标签: python pandas numpy

CSV数据的结构如下:

value,name,count,sum_conc
0,MS_a,1974,1995
4,MS_a,1,1995
18,MS_a,1,1995
54,MS_a,1,1995
80,MS_a,1,1995...

我有以下代码将我的数据处理到垃圾箱中:

import pandas as pd
import numpy as np

data_dir = '../data/'


def create_transposed_data(filename, min_value, max_value, bin_size):
    df = pd.read_csv(data_dir+filename)
    # define bin edges
    bins = np.arange(min_value - 1, max_value + 1, bin_size)
    # define bin labels
    labels = [f'{i+1}-{j+1}' for i, j in zip(bins[:-1], bins[1:])]
    # adding bin column for later grouping
    df['bin'] = pd.cut(df['value'], bins, labels=labels)
    # get totals per name
    bin_totals = df.groupby(['name'])['count'].sum()
    # bin the data and calculate the percentage
    binned_data = df.groupby(['bin', 'name'])['count'].sum() / bin_totals
    result = binned_data.unstack(0).fillna("0.0")
    print(result)
    result.to_csv(data_dir + filename + '_result.csv', sep=',', index=False)


create_transposed_data("tmp2.csv", 0, 359, 36)

打印结果时,我可以看到分组的列(bin和名称):

bin       0-36     36-72    72-108   108-144   144-180   180-216   216-252  \
name                                                                         
MS_a  0.990476  0.000501  0.002506  0.001003  0.000501  0.001504  0.000501   
MS_b  0.099487  0.098697  0.103829  0.097513  0.101856  0.116068  0.088038   

bin    252-288   288-324   324-360  
name                                
MS_a  0.001504  0.000501  0.001003  
MS_b  0.094749  0.097118  0.102645  

但是保存结果时,我缺少名称列:

0-36,36-72,72-108,108-144,144-180,180-216,216-252,252-288,288-324,324-360
0.9904761904761905,0.0005012531328320802,0.002506265664160401,0.0010025062656641604,0.0005012531328320802,0.0015037593984962407,0.0005012531328320802,0.0015037593984962407,0.0005012531328320802,0.0010025062656641604
0.09948677457560205,0.0986971969996052,0.10382945124358468,0.09751283063560995,0.10185550730359258,0.11606790367153573,0.08803789972364785,0.094749309119621,0.09711804184761152,0.10264508487958941

我想念什么?

0 个答案:

没有答案