CSV数据的结构如下:
value,name,count,sum_conc
0,MS_a,1974,1995
4,MS_a,1,1995
18,MS_a,1,1995
54,MS_a,1,1995
80,MS_a,1,1995...
我有以下代码将我的数据处理到垃圾箱中:
import pandas as pd
import numpy as np
data_dir = '../data/'
def create_transposed_data(filename, min_value, max_value, bin_size):
df = pd.read_csv(data_dir+filename)
# define bin edges
bins = np.arange(min_value - 1, max_value + 1, bin_size)
# define bin labels
labels = [f'{i+1}-{j+1}' for i, j in zip(bins[:-1], bins[1:])]
# adding bin column for later grouping
df['bin'] = pd.cut(df['value'], bins, labels=labels)
# get totals per name
bin_totals = df.groupby(['name'])['count'].sum()
# bin the data and calculate the percentage
binned_data = df.groupby(['bin', 'name'])['count'].sum() / bin_totals
result = binned_data.unstack(0).fillna("0.0")
print(result)
result.to_csv(data_dir + filename + '_result.csv', sep=',', index=False)
create_transposed_data("tmp2.csv", 0, 359, 36)
打印结果时,我可以看到分组的列(bin和名称):
bin 0-36 36-72 72-108 108-144 144-180 180-216 216-252 \
name
MS_a 0.990476 0.000501 0.002506 0.001003 0.000501 0.001504 0.000501
MS_b 0.099487 0.098697 0.103829 0.097513 0.101856 0.116068 0.088038
bin 252-288 288-324 324-360
name
MS_a 0.001504 0.000501 0.001003
MS_b 0.094749 0.097118 0.102645
但是保存结果时,我缺少名称列:
0-36,36-72,72-108,108-144,144-180,180-216,216-252,252-288,288-324,324-360
0.9904761904761905,0.0005012531328320802,0.002506265664160401,0.0010025062656641604,0.0005012531328320802,0.0015037593984962407,0.0005012531328320802,0.0015037593984962407,0.0005012531328320802,0.0010025062656641604
0.09948677457560205,0.0986971969996052,0.10382945124358468,0.09751283063560995,0.10185550730359258,0.11606790367153573,0.08803789972364785,0.094749309119621,0.09711804184761152,0.10264508487958941
我想念什么?