我是Python的初学者,我正在尝试改进代码-因此,我希望您能就如何提高以下代码的效率提出一些建议。
我有以下数据集:
petdata = {
'animal' : ['dog', 'cat', 'fish'],
'male_1' : [0.57, 0.72, 0.62],
'female_1' : [0.43, 0.28, 0.38],
'age_01_1': [0.10,0.16,0.15],
'age_15_1':[0.17,0.29,0.26],
'age_510_1':[0.15,0.19,0.19],
'age_1015_1':[0.18,0.16,0.17],
'age_1520_1':[0.20,0.11,0.12],
'age_20+_1':[0.20,0.09,0.10],
'male_2' : [0.57, 0.72, 0.62],
'female_2' : [0.43, 0.28, 0.38],
'age_01_2': [0.10,0.16,0.15],
'age_15_2':[0.17,0.29,0.26],
'age_510_2':[0.15,0.19,0.19],
'age_1015_2':[0.18,0.16,0.17],
'age_1520_2':[0.20,0.11,0.12],
'age_20+_2':[0.20,0.09,0.10],
'weight_1': [10,20,30],
'weight_2':[40,50,60]
}
df = pd.DataFrame(petdata)
我想对所有以“ _1”结尾的变量使用weight_1,对所有以“ _2”结尾的变量使用weight_2来计算数据集中动物的加权平均值。
我目前正以这种方式进行操作:
df['male_wav_1']=np.nansum((df['male_1']*df['weight_1'])/df['weight_1'].sum())
df['female_wav_1']=np.nansum((df['female_1']*df['weight_1'])/df['weight_1'].sum())
df['male_wav_2']=np.nansum((df['male_2']*df['weight_2'])/df['weight_2'].sum())
df['female_wav_2']=np.nansum((df['female_2']*df['weight_2'])/df['weight_2'].sum())
这是我数据框中的每一列(即age_01_1_wav,age_15_1_wav ...)。我意识到这不是很整洁,所以有人可以给我一些有关如何改进流程的建议吗?
我试图:
但是我都没有成功。问题不在于重塑,我可以这样做,但是我不清楚如何将不同的权重应用于数据中的不同组。
非常感谢您的帮助。
答案 0 :(得分:1)
首先,我假设“动物”列是您的索引,所以为了看起来像一张表,我将其作为索引:
import pandas as pd
import numpy as np
petdata = {
# All of your data ^ above
}
df = pd.DataFrame(petdata) # Creates the DF from your dictionary
df.set_index('animal',inplace=True) # Sets the 'animal' column as the index
我首先将您的DataFrame分为两部分:df_1和df_2
# Uses list comprehension to create a list of all column names with a given string
# in the name, and uses this list to get a sub-DataFrame for each
df_1 = df[[name for name in df.columns if '_1' in name]]
df_2 = df[[name for name in df.columns if '_2' in name]]
我宁愿在DataFrame中为每个已经存在的每个系列创建一个新的Series(列),而不是创建一个新行,作为每一列的加权平均值(wav)。由于新行将不是动物,所以它不会那么漂亮,但是索引“ wav”将在动物列中。
使用列表理解和您使用的方程式生成两个加权平均值列表:
wav_1 = [np.nansum(df[col]*df_1['weight_1'])/np.nansum(df_1['weight_1']) for col in df_1.columns]
wav_2 = [np.nansum(df[col]*df_1['weight_2'])/np.nansum(df_1['weight_2']) for col in df_2.columns]
然后使用新的“ wav”标签将此数据附加到两个DataFrame中:
df_1.loc['wav'] = wav_1
df_2.loc['wav'] = wav_2
请注意,“ wav”-“ weight_x”框中存在垃圾数据。这是您的体重的加权平均值。
欢迎使用Python!希望这会有所帮助。
答案 1 :(得分:0)
您可以使用Python zip()函数进行一些快速计算。
petdata = {
'animal' : ['dog', 'cat', 'fish'],
'male_1' : [0.57, 0.72, 0.62],
'age_20+_2':[0.20,0.09,0.10],
'weight_1': [10,20,30],
'weight_2':[40,50,60]
}
weight_1 = petdata.get('weight_1')
male_1 = petdata.get('male_1')
for sales, costs in zip(weight_1, male_1):
profit =sales * costs / sales
print(f'Total profit: {profit}')
Total profit: 0.57
Total profit: 0.72
Total profit: 0.62