我正在尝试找到一种基于列对数据排序的方法。我下面的当前代码非常接近,但我最终希望将Joe移到顶部 - 并将所有行保持在一起 - 因为他的总数最多。
更新1: 'Total'
并不总是最大值 - 因此需要使用'Yes'
指定 - (部分美元金额可能为负数)。
更新2 :我的代码和所需的输出已更新,以显示'Total'
行可能小于组中另一个'Dollar'
的位置(由于负元) ,但它应该仍然是'Dude'
组的第一行。
我的代码使分组正确,但最终不会对'Dude'
组进行排序。
import pandas as pd
headers = ['Date','Dude','Dollar', 'Total']
df = pd.DataFrame({
'Dude':['Bob','Bob','Sam','Bob','Joe','Joe','Joe','Bob','Sam','Sam','Joe','Sam'],
'Dollar':[4,1,-2,1,5,12,3,2,7,1,4,8],
'Total':['Yes','No','No','No','No','Yes','No','No','Yes','No','No','No'],
'Date':['1/1/2016','1/1/2016','1/1/2016','3/1/2016','3/1/2016','1/1/2016','1/1/2016','5/1/2016','1/1/2016','3/1/2016','5/1/2016','5/1/2016']
}, columns = headers)
df['Date'] = pd.to_datetime(df['Date'])
df.sort_values(by = ['Dude','Total','Date'], ascending = [True, False, True], inplace = True)
输出:
Date Dude Dollar Total
0 2016-01-01 Bob 4 Yes
1 2016-01-01 Bob 1 No
3 2016-03-01 Bob 1 No
7 2016-05-01 Bob 2 No
5 2016-01-01 Joe 12 Yes
6 2016-01-01 Joe 3 No
4 2016-03-01 Joe 5 No
10 2016-05-01 Joe 4 No
8 2016-01-01 Sam 7 Yes
2 2016-01-01 Sam -2 No
9 2016-03-01 Sam 1 No
11 2016-05-01 Sam 8 No
期望的输出:
Date Dude Dollar Total
5 2016-01-01 Joe 12 Yes
6 2016-01-01 Joe 3 No
4 2016-03-01 Joe 5 No
10 2016-05-01 Joe 4 No
8 2016-01-01 Sam 7 Yes
2 2016-01-01 Sam -2 No
9 2016-03-01 Sam 1 No
11 2016-05-01 Sam 8 No
0 2016-01-01 Bob 4 Yes
1 2016-01-01 Bob 1 No
3 2016-03-01 Bob 1 No
7 2016-05-01 Bob 2 No
答案 0 :(得分:3)
你可以设置' Dude'列作为具有所需排序的分类数据类型,然后按照之前的排序进行排序。这也可以让你拥有“老兄”的其他好处。列为分类。
# Get the ordering of Dudes based on max dollar.
dude_order = df[df['Total'] == 'Yes'].sort_values(by='Dollar', ascending=False)
# Set dude as categorical with the previously determined ordering.
df['Dude'] = df['Dude'].astype('category', categories=dude_order['Dude'], ordered=True)
# Sort the dataframe.
df = df.sort_values(by=['Dude', 'Total', 'Date'], ascending=[True, False, True])
结果输出:
Date Dude Dollar Total
5 2016-01-01 Joe 12 Yes
6 2016-01-01 Joe 3 No
4 2016-03-01 Joe 5 No
10 2016-05-01 Joe 4 No
8 2016-01-01 Sam 7 Yes
2 2016-01-01 Sam -2 No
9 2016-03-01 Sam 1 No
11 2016-05-01 Sam 8 No
0 2016-01-01 Bob 4 Yes
1 2016-01-01 Bob 1 No
3 2016-03-01 Bob 1 No
7 2016-05-01 Bob 2 No
答案 1 :(得分:2)
<强>更新强>
In [162]: m = df.loc[df.Total=='Yes'].set_index('Dude')['Dollar']
In [163]: m
Out[163]:
Dude
Bob 4
Joe 12
Sam 7
Name: Dollar, dtype: int64
In [164]: df.assign(x=df.Dude.map(m)) \
...: .sort_values(['x','Dude','Total','Date'], ascending=[0,1,0,1]) \
...: .drop('x', 1)
Out[164]:
Date Dude Dollar Total
5 2016-01-01 Joe 12 Yes
6 2016-01-01 Joe 3 No
4 2016-03-01 Joe 5 No
10 2016-05-01 Joe 4 No
8 2016-01-01 Sam 7 Yes
2 2016-01-01 Sam -2 No
9 2016-03-01 Sam 1 No
11 2016-05-01 Sam 8 No
0 2016-01-01 Bob 4 Yes
1 2016-01-01 Bob 1 No
3 2016-03-01 Bob 1 No
7 2016-05-01 Bob 2 No
旧回答:
In [96]: df.assign(x=df.groupby('Dude').Dollar.transform('max')) \
...: .sort_values(['x','Dude','Dollar','Date'], ascending=[0,1,0,1]) \
...: .drop('x',1)
Out[96]:
Date Dude Dollar Total
5 2016-01-01 Joe 12 Yes
4 2016-03-01 Joe 5 No
10 2016-05-01 Joe 4 No
6 2016-01-01 Joe 3 No
8 2016-01-01 Sam 8 Yes
11 2016-05-01 Sam 5 No
2 2016-01-01 Sam 2 No
9 2016-03-01 Sam 1 No
0 2016-01-01 Bob 4 Yes
7 2016-05-01 Bob 2 No
1 2016-01-01 Bob 1 No
3 2016-03-01 Bob 1 No
答案 2 :(得分:2)
我的解决方案......它首先找到所有“是”行,将它们合并回原始数据帧,然后先对它们进行排序。
import pandas as pd
headers = ['Date','Dude','Dollar', 'Total']
df = pd.DataFrame({
'Dude':['Bob','Bob','Sam','Bob','Joe','Joe','Joe','Bob','Sam','Sam','Joe','Sam'],
'Dollar':[4,1,-2,1,5,12,3,2,7,1,4,8],
'Total':['Yes','No','No','No','No','Yes','No','No','Yes','No','No','No'],
'Date':['1/1/2016','1/1/2016','1/1/2016','3/1/2016','3/1/2016','1/1/2016','1/1/2016','5/1/2016','1/1/2016','3/1/2016','5/1/2016','5/1/2016']
}, columns = headers)
df['Date'] = pd.to_datetime(df['Date'])
# Just the Total = Yes row for each dude, with dollar renamed to total_dollar
totals = df.loc[df['Total'] == 'Yes', ['Dude', 'Dollar']]
totals.columns = ['Dude', 'Total_Dollar']
# Merge back on dude, sort by total dollars before sorting by everything else
df = df.merge(totals, on='Dude').sort_values(by = ['Total_Dollar', 'Dude', 'Total', 'Date'], ascending = [False, True, False, True])
del df['Total_Dollar']
输出:
Date Dude Dollar Total
9 2016-01-01 Joe 12 Yes
10 2016-01-01 Joe 3 No
8 2016-03-01 Joe 5 No
11 2016-05-01 Joe 4 No
5 2016-01-01 Sam 7 Yes
4 2016-01-01 Sam -2 No
6 2016-03-01 Sam 1 No
7 2016-05-01 Sam 8 No
0 2016-01-01 Bob 4 Yes
1 2016-01-01 Bob 1 No
2 2016-03-01 Bob 1 No
3 2016-05-01 Bob 2 No