Question

我有一个具有以下内容的Series对象：

    date   price
    dec      12
    may      15
    apr      13
    ..

问题陈述：我希望按月显示并计算每月的平均价格，并按月按排序方式显示。

期望的输出：

 month mean_price
  Jan    XXX
  Feb    XXX
  Mar    XXX

我想要制作一个列表并将其传递给排序函数：

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

但 sort_values 不支持系列。

我遇到的一个大问题是即使

df = df.sort_values(by='date',ascending=True,inplace=True)有效到最初的df，但在我做了groupby之后，它没有维护从排序的df发出的订单。

总而言之，我需要从初始数据框这两列。使用月份（dt.strftime（'％B'））对datetime列和groupby进行排序，排序搞砸了。现在我必须按月份名称进行排序。

我的代码：

df # has 5 columns though I need the column 'date' and 'price'

df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great
total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically

Answer 1

您可以使用分类数据来启用正确的排序：

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
df['months'] = pd.Categorical(df['months'], categories=months, ordered=True)
df.sort_values(...)  # same as you have now; can use inplace=True

当您指定类别时，pandas会记住规范的顺序作为默认排序顺序。

文档：Pandas类别＆gt; sorting & order

Answer 2

感谢@Brad Solomon提供更快捷的方式来大写字符串！

注1 @Brad Solomon使用pd.categorical的答案应该比我的答案更能节省您的资源。他展示了如何为您的分类数据分配顺序。你不应该错过它：P

或者，您可以使用。

df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
                  ["aug", 11], ["jan", 11], ["jan", 1]], 
                   columns=["Month", "Price"])
# Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
df["Month"] = df["Month"].str.capitalize()

# Now the dataset should look like
#   Month Price
#   -----------
#    Dec    XX
#    Jan    XX
#    Apr    XX

# make it a datetime so that we can sort it: 
# use %b because the data use the abbriviation of month
df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df = df.sort_values(by="Month")

total = (df.groupby(df['Month"])['Price'].mean())

# total 
Month
1     17.333333
3     11.000000
8     16.000000
12    12.000000

注2 默认情况下，groupby会为您排序组密钥。请注意在df = df.sort_values(by=SAME_KEY)和total = (df.groupby(df[SAME_KEY])['Price'].mean()).中使用相同的键进行排序和分组。否则，可能会出现意外行为。有关详细信息，请参阅Groupby preserve order among groups? In which way?。

注3 计算效率更高的方法是首先计算均值，然后按月进行排序。这样，您只需要对12个项目进行排序，而不是整个df。如果不需要df进行排序，它将降低计算成本。

注释4 对于已经 month作为索引的人，并想知道如何将其分类，请查看pandas.CategoricalIndex @jezrael有一个关于在Pandas series sort by month index

中排序的分类索引的工作示例

Answer 3

我将使用calender模块和reindex：

series.str.capitalize有助于大写该序列，然后我们使用calender模块和map用该序列创建字典以获取月份编号。

一旦有了月份号，我们就可以sort_values()并获得索引。然后reindex。

import calendar
df.date=df.date.str.capitalize() #capitalizes the series
d={i:e for e,i in enumerate(calendar.month_abbr)} #creates a dictionary
#d={i[:3]:e for e,i in enumerate(calendar.month_name)} 
df.reindex(df.date.map(d).sort_values().index) #map + sort_values + reindex with index

  date  price
2  Apr     13
1  May     15
0  Dec     12

Answer 4

您应该考虑根据轴0（索引）对其重新编制索引

new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

df1 = df.reindex(new_order, axis=0)

Answer 5

使用 Sort_Dataframeby_Month 功能按时间顺序对月份名称进行排序

需要安装包。

$ pip install sorted-months-weekdays
$ pip install sort-dataframeby-monthorweek

示例：

from sorted_months_weekdays import *

from sort_dataframeby_monthorweek import *

df = pd.DataFrame([['Jan',23],['Jan',16],['Dec',35],['Apr',79],['Mar',53],['Mar',12],['Feb',3]], columns=['Month','Sum'])
df
Out[11]: 
  Month  Sum
0   Jan   23
1   Jan   16
2   Dec   35
3   Apr   79
4   Mar   53
5   Mar   12
6   Feb    3

按月使用以下功能对数据进行排序

Sort_Dataframeby_Month(df=df,monthcolumnname='Month')
Out[14]: 
  Month  Sum
0   Jan   23
1   Jan   16
2   Feb    3
3   Mar   53
4   Mar   12
5   Apr   79
6   Dec   35

Answer 6

您可以将月份的数字值与名称一起添加到索引中（即“ 01 January”），进行排序然后去除数字：

total=(df.groupby(df['date'].dt.strftime('%m %B'))['price'].mean()).sort_index()

可能看起来像这样：

01 January  xxx
02 February     yyy
03 March    zzz
04 April    ttt

 total.index = [ x.split()[1] for x in total.index ]

January xxx
February yyy
March zzz
April ttt

按月份名称对熊猫的数据框系列进行排序？

6 个答案: