如何显示熊猫特定城市十年内每年的平均销售额?

时间:2021-06-03 20:24:32

标签: python pandas dataframe jupyter-notebook

显示卡莱尔市每年平均销售量的正确方法是什么? 2010-2020 年?

以下是大数据框的缩写形式,仅显示与问题相关的列和行:

import pandas as pd
df = pd.DataFrame({'Date': ['01/09/2009','01/10/2009','01/11/2009','01/12/2009','01/01/2010','01/02/2010','01/03/2010','01/04/2010','01/05/2010','01/06/2010','01/07/2010','01/08/2010','01/09/2010','01/10/2010','01/11/2010','01/12/2010','01/01/2011','01/02/2011'],
                   'RegionName': ['Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle','Carlisle'],
                    'SalesVolume': [118,137,122,132,83,81,105,114,110,106,137,130,129,121,129,100,84,62]})

这是我试过的:

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_csv ('C:/Users/user/AppData/Local/Programs/Python/Python39/Scripts/uk_hpi_dataset_2021_01.csv')

df.Date = pd.to_datetime(df.Date)

df['Year'] = pd.to_datetime(df['Date']).apply(lambda x:
                                               '{year}'.format(year=x.year).zfill(2))

carlisle_vol = df[df['RegionName'].str.contains('Carlisle')]
carlisle_vol.groupby('Year')['SalesVolume'].mean()

print(sales_vol)

当我尝试运行此代码时,它不会过滤“日期”列以仅计算从“01/01/2010”开始到“01/12/2020”结束的年份的平均销售量。出于某种原因,它还打印出每隔一列都正常。谁能帮我正确回答这个问题?

This is the result I've got

2 个答案:

答案 0 :(得分:2)

>>> df.loc[(df["Date"].dt.year.between(2010, 2020))
           & (df["RegionName"] == "Carlisle")] \
  .groupby([pd.Grouper(key="Date", freq="Y")])["SalesVolume"].mean()

Date
2010-01-01    112.083333
2011-01-01     73.000000
Freq: A-DEC, Name: SalesVolume, dtype: float64

进一步

@nocibambi 的答案之间的唯一区别是 groupby 参数,尤其是 freqpd.Grouper 参数。假设您的会计年度从 9 月 1 日开始。

每 3 个月的销售额:

>>> df
        Date  Sales
0 2010-09-01      1  # 1st group: mean=2.5
1 2010-12-01      2
2 2011-03-01      3
3 2011-06-01      4
4 2011-09-01      5  # 2nd group: mean=6.5
5 2011-12-01      6
6 2012-03-01      7
7 2012-06-01      8

>>> df.groupby(pd.Grouper(key="Date", freq="AS-SEP")).mean()
            Sales
Date
2010-09-01    2.5
2011-09-01    6.5

查看文档以了解 freq aliasesanchoring suffix

的所有值

答案 1 :(得分:0)

您可以使用 datetime accessor 访问年份:

df[
    (df["RegionName"] == "Carlisle")
    & (df["Date"].dt.year >= 2010)
    & (df["Date"].dt.year <= 2020)
].groupby(df.Date.dt.year)["SalesVolume"].mean()

>>>

Date
2010    112.083333
2011     73.000000
Name: SalesVolume, dtype: float64