Question

我有以下数据框，该数据框首先按发票周期分组，然后在每个发票周期中添加到诊所数量。

我使用以下代码添加了count列：

df5 = df4.groupby(['Invoice Cycle', 'Clinic']).size().reset_index(name='counts')

然后使用此代码设置索引并获取数据帧，如上图所示：

df5 = df5.set_index(['Invoice Cycle','Clinic'])

现在，我想对“发票周期”列进行重新排序，以便日期按顺序排列为12月16日，1月17日，2月17日，3月17日等。

然后，我想在每个发票周期中对诊所进行重新排序，以便计数最高的诊所在顶部，计数最低的诊所在底部。

鉴于“发票周期”中的值是字符串，而不是时间戳，所以我似乎无法完成上述两项任务。

是否可以对数据框重新排序？

Answer 1

您可以创建一个将日期字符串转换为日期时间格式的函数：

import pandas as pd
import datetime 

def str_to_date(string):
    # This will get you the date with the first day of the month (ex. 01-Jan-2017)
    date = datetime.datetime.strptime(string, '%y-%b')
    return date

df['Invoice Cycle'] = df['Invoice Cycle'].apply(str_to_date)
# now you an sort correctly
df = df.sort_values(['Invoice Cycle', 'counts'])

分组数据框并根据日期和计数重新排序

1 个答案: