这是我迄今为止遇到的最困难的问题。我正在尝试创建基于ratetype索引的图。例如,我要有效地创建唯一的费率类型x该费率类型的平均客户数量的矩阵。用于获取值等于每个单独的费率类型的行的lambda表达式,然后获取该类型的平均客户数量,然后基于这两个大小和长度相同且准确的列表创建一个序列,这对我来说是很困难的大熊猫
不同费率类型的数量可以是数百种。从逻辑上讲,通过lambda将其读入列表比对每种可能性进行硬编码是更好的选择,因为列表只会增加大小和新的可变性。
""" a section of the data for example use. Working with column "Ratetype"
column "NumberofCustomers" to work towards getting something like
list1 = unique occurs of ratetypes
list2 = avg number of customers for each ratetype
rt =['fixed','variable',..]
avg_cust_numbers = [45.3,23.1,...]
**basically for each ratetype: get mean of all row data for custno column**
ratetype,numberofcustomers
fixed,1232
variable, 1100
vec, 199
ind, 1211
alg, 123
bfd, 788
csv, 129
ggg, 1100
aaa, 566
acc, 439
"""
df['ratetype','number_of_customers']
fixed = df.loc['ratetype']=='fixed']
avg_fixed_custno = fixed.mean()
rt_counts = df.ratetype.value_counts()
rt_uniques = df.ratetype.unique()
# rt_uniques would be same size vector as avg_cust_nos, has to be anyway
avg_cust_nos = [avg_fixed_custno, avg_variable_custno]
我的目标是使用matplot.pyplot创建和绘制这些子图。
data = {'ratetypes': pd.Series(rt_counts, index=rt_uniques),
'Avg_cust_numbers': pd.Series(avg_cust_nos, index=rt_uniques),
}
df = pd.DataFrame(data)
df = df.sort_values(by=['ratetypes'], ascending=False)
fig, axes = plt.subplots(nrows=2, ncols=1)
for i, c in enumerate(df.columns):
df[c].plot(kind='bar', ax=axes[i], figsize=(12, 10), title=c)
plt.savefig('custno_byrate.png', bbox_inches='tight')