熊猫-创建具有相同优先级类别的分类对象

时间:2019-04-01 13:13:07

标签: python pandas

假设我要订购某些类别。例如,颜色:

Green = Yellow > Red

此处,绿色和黄色具有相同的优先级,大于红色的优先级。是否有可能创建这样的分类对象?我可以做这样的事情吗?

df['Color'] = pd.Categorical(df['Color'], categories=[('Green', 'Yellow'), 'Red'], ordered=True)

元组('Green', 'Yellow')表示绿色和黄色具有相同的优先级。

示例输入DataFrame:

ID    Color
1     Red
2     Yellow
1     Yellow
3     Red
1     Green
2     Red

预期输出是没有重复ID的DataFrame,并考虑了颜色的优先级:

ID    Color
1     Yellow
2     Yellow
3     Red

3 个答案:

答案 0 :(得分:2)

可以解决以下问题:

由于提供的信息,我们将把黄色和绿色列为优先事项。

我们将在此处使用以下方法。

DataFrame.apply ,它使我们可以沿Docs轴应用功能:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

已排序,这使我们可以通过指定键以所需的顺序对列表进行排序。文件:https://docs.python.org/3/library/functions.html#sorted 您也可以对list.sort做同样的事情,但对sorted给出列表的新对象。

import pandas as pd
#create DataFrame
df=pd.DataFrame({'ID':[1,2,1,3,1,2],'Color':['Red','Yellow','Yellow','Red','Green','Red']})

"""
Creating a set_priority list by putting the colors with the highest priority at 
the top and the lowest priority or no priority at the bottom. This list would be 
used as the key in the sorted function below.

Below I am grouping the df with the Id and making a list of all the colors attached to the same ID. Then I am sorting the list based on the priority and choosing the first element from that list as it was asked in the question
"""

set_priority=['Yellow','Green' ,'Red']
result=df.groupby('ID')['Color'].apply(lambda x: sorted(list(x), key=lambda y: set_priority.index(y))[0]).reset_index()

结果

   ID   Color
0   1  Yellow
1   2  Yellow
2   3     Red

答案 1 :(得分:0)

使用分类时,可以指定自定义排序顺序。这没有给出问题中所需的关系,但是也许sort_dict可以用来建模这样的事情。

import pandas as pd

colors = ["Green", "Red", "Yellow", "Yellow", "Red", "Green"]
df = pd.DataFrame({"Color":colors})
sort_dict = {"Yellow":-1, "Green":1, "Red":6}
df["colorcat"] = pd.Categorical(df['Color'], categories=sorted(sort_dict, key=sort_dict.get), ordered=True)
print(df.sort_values("colorcat"))

    Color colorcat
2  Yellow   Yellow
3  Yellow   Yellow
0   Green    Green
5   Green    Green
1     Red      Red
4     Red      Red

将元组放在类别中似乎不起作用。

import pandas as pd

colors = ["Green", "Red", "Yellow", "Yellow", "Red", "Green"]
df = pd.DataFrame({"Color":colors})
df["colorcat"] = pd.Categorical(df['Color'], categories=[("Green", "Yellow"), "Red"], ordered=True)
print(df.sort_values("colorcat"))

    Color colorcat
1     Red      Red
4     Red      Red
0   Green      NaN
2  Yellow      NaN
3  Yellow      NaN
5   Green      NaN

答案 2 :(得分:0)

import pandas as pd
# Create an example dataframe
data = {'ID': ['1' , '2', '1', '3', '1', '2'], 
        'Color': ['Red' , 'Yellow' , 'Yellow' , 'Red', 'Green', 'Red']}
df1 = pd.DataFrame(data)

a = df1.join(df1.groupby(['ID'])['Color'].apply(set).rename('m'),
             on=['ID'])['m']

m1 = (a == set({'Green', 'Yellow', 'Red'})) | (a == set({'Green', 'Yellow'}))| (a == set({'Red', 'Yellow'}))
m2 = a == set({'Red'})

m4 = df1['Color'] == 'Yellow'
m5 = df1['Color'] == 'Red'

df1 = df1[(m1 & m4) | (m2 & m5) ]

print(df1)

    Color ID
1  Yellow  2
2  Yellow  1
3     Red  3