Question

我有一个数据框名称为 store_data.csv ，其中有数千个数据框。样本数据就是这样-

Date       Store1   Store2   Store3   Store4
2018-06-01 2643     1642     2678     3050
2018-07-16 6442     5413     5784     7684
2018-07-24 4587     5743     3948     6124
2018-08-12 3547     8743     7462     8315

如何计算python中最后一个月数据的哪家商店的总销售额最高？

Answer 1

首先创建DatetimeIndex：

#if necessary
#df = df.set_index('Date')
#df['Date'] = pd.to_datetime(df['Date'])

print (df)
            Store1  Store2  Store3  Store4
Date                                      
2018-06-01    2643    1642    2678    3050
2018-07-16    6442    5413    5784    7684
2018-08-10    4587    5743    3948    6124 <-change date for better sample
2018-08-12    3547    8743    7462    8315

print (df.index)
DatetimeIndex(['2018-06-01', '2018-07-16', '2018-08-10', '2018-08-12'], 
              dtype='datetime64[ns]', name='Date', freq=None)

然后通过to_period转换为月周期：

df1 = df.set_index(df.index.to_period('M'))
print (df1)
         Store1  Store2  Store3  Store4
Date                                   
2018-06    2643    1642    2678    3050
2018-07    6442    5413    5784    7684
2018-08    4587    5743    3948    6124
2018-08    3547    8743    7462    8315

根据上一个值sum进行过滤，最后根据上一个Series.idxmax的最大值来获取列名：

print (df1.loc[df1.index[-1]].sum())
Store1     8134
Store2    14486
Store3    11410
Store4    14439
dtype: int64

out = df1.loc[df1.index[-1]].sum().idxmax()
print (out)
Store2

谢谢@Jon Clements的另一个解决方案：

out = df.last('M').resample('M').sum().T.idxmax()
#if need scalar output
out = df.last('M').resample('M').sum().iloc[0].idxmax()

Answer 2

此解决方案特定于您的问题，有点笨拙，但我已经对其进行了测试，并且似乎对我有用。

此程序将查找上个月销售额最高的商店。该程序假定按顺序给出了几个月（没有混合数据）。如果这是一个问题，请将问题修改得更具体些，我会解决的。一种可能的实现方式是使用dictionary跟踪每个月，然后访问上个月的数据以查找最大值。

import re

def get_highest_sales(filename):
    sales_during_month = [0, 0, 0, 0]
    with open(filename) as f:
        f.readline() # Skip first line
        prev_month = ""
        for line in f:
            cleaned = re.sub(" +", ' ', line)
            values = cleaned.split(' ')
            month  = values[0].split('-')[1]
            if not month == prev_month:
                prev_month = month
                sales_during_month = [0, 0, 0, 0]
            sales = [float(sale) for sale in values[1:]]
            for store,sale in enumerate(sales):
                sales_during_month[store] += sale

    return "Store: " + str(sales_during_month.index(max(sales_during_month)) + 1)

如何在python中计算上个月数据的最高总销售额

2 个答案: