Pandas DataFrame按分类列排序,但按特定的类排序排序

时间:2016-08-30 09:10:13

标签: python-2.7 sorting pandas dataframe categorical-data

我想使用df_selected = df_targets.head(N)在特定列的条目中选择Pandas数据框中的顶级条目。

每个条目都有一个target值(按重要性顺序排列):

Likely Supporter, GOTV, Persuasion, Persuasion+GOTV  

不幸的是,如果我这样做

df_targets = df_targets.sort("target")

排序将按字母顺序排列(GOTVLikely Supporter,...)。

我希望找到像list_ordering这样的关键字,如:

my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"] 
df_targets = df_targets.sort("target", list_ordering=my_list)

为了解决这个问题,我创建了一个词典:

dict_targets = OrderedDict()
dict_targets["Likely Supporter"] = "0 Likely Supporter"
dict_targets["GOTV"] = "1 GOTV"
dict_targets["Persuasion"] = "2 Persuasion"
dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV"

,但它似乎是一种非pythonic方法。

建议将不胜感激!

4 个答案:

答案 0 :(得分:8)

我认为您需要Categorical参数import pandas as pd df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter', 'GOTV', 'Persuasion', 'Persuasion+GOTV']}) df.a = pd.Categorical(df.a, categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"], ordered=True) print (df) a 0 GOTV 1 Persuasion 2 Likely Supporter 3 GOTV 4 Persuasion 5 Persuasion+GOTV print (df.a) 0 GOTV 1 Persuasion 2 Likely Supporter 3 GOTV 4 Persuasion 5 Persuasion+GOTV Name: a, dtype: category Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV] ,然后按sort_values排序非常好:

df.sort_values('a', inplace=True)
print (df)
                  a
2  Likely Supporter
0              GOTV
3              GOTV
1        Persuasion
4        Persuasion
5   Persuasion+GOTV
{{1}}

答案 1 :(得分:0)

感谢jerzrael的输入和参考,

我喜欢这种解决方案:

list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]  

df["target"] = df["target"].astype("category", categories=list_ordering, ordered=True)

答案 2 :(得分:0)

我之前的回答中显示的方法现已弃用。

最好使用pandas.Categorical,如here所示。

所以:

list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]  
df["target"] = pd.Categorical(df["target"], categories=list_ordering) 

答案 3 :(得分:0)

我想这是最充分的一个,在您遇到某些情况时更喜欢: 这是您的首选订购方式...

my_order = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"]

所以,做...

df['Column_to_update'].cat.reorder_categories(my_order, inplace= True)

它很灵活,不需要分配新的类别。但是... 您的列必须是 dtype = 'category' 否则它将不起作用。

Read more here (Pandas documentation)