在熊猫数据框中计算每种产品的平均价格

时间:2020-09-16 03:12:17

标签: python pandas dataframe

我有一个看起来像这样的数据框:

import pandas as pd

Z = pd.DataFrame({'Product': ['Apple', 'Apple', 'Apple', 'Orange', 'Orange], 'Selling Price': [1.1, 1.2, 1.3, 2.1, 2.2]})

有成千上万的独特产品和数亿美元的售价。 我如何有效地报告每个独特产品的平均售价?

Result = pd.DataFrame({'Product': ['Apple', 'Orange'], 'Average Selling Price': [1.2, 2.15]})

挑战在于,数据存储在数百个不同的.csv文件中(文件名存储在列表files中),我无法同时将其加载到环境中。所以我会做类似的事情

for i in files:
     X = pd.read_csv(i)
     # add unique products to the data frame Z
     # add the sum of their selling prices to Z
     # add the number of times the product was sold

# for each unique product, divide the sum of selling prices by the number of times that product was sold

感谢您提供的任何帮助!

1 个答案:

答案 0 :(得分:1)

final_df = pd.DataFrame()
for i in files:
    X = pd.read_csv(i)
    X_agg = X.groupby('Product', as_index=False).agg({'Selling Price':['count', 'sum']})
    X_agg.columns = ['Product', 'sale_count', 'selling_sum']
    final_df = pd.concat([final_df, X_agg])
    final_df = final_df.groupby('Product', as_index=False).agg({'sale_count':'sum', 'selling_sum':'sum'})