假设我要订购某些类别。例如,颜色:
Green = Yellow > Red
此处,绿色和黄色具有相同的优先级,大于红色的优先级。是否有可能创建这样的分类对象?我可以做这样的事情吗?
df['Color'] = pd.Categorical(df['Color'], categories=[('Green', 'Yellow'), 'Red'], ordered=True)
元组('Green', 'Yellow')
表示绿色和黄色具有相同的优先级。
示例输入DataFrame:
ID Color
1 Red
2 Yellow
1 Yellow
3 Red
1 Green
2 Red
预期输出是没有重复ID的DataFrame,并考虑了颜色的优先级:
ID Color
1 Yellow
2 Yellow
3 Red
答案 0 :(得分:2)
可以解决以下问题:
由于提供的信息,我们将把黄色和绿色列为优先事项。
我们将在此处使用以下方法。
DataFrame.apply ,它使我们可以沿Docs轴应用功能:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
已排序,这使我们可以通过指定键以所需的顺序对列表进行排序。文件:https://docs.python.org/3/library/functions.html#sorted 您也可以对list.sort做同样的事情,但对sorted给出列表的新对象。
import pandas as pd
#create DataFrame
df=pd.DataFrame({'ID':[1,2,1,3,1,2],'Color':['Red','Yellow','Yellow','Red','Green','Red']})
"""
Creating a set_priority list by putting the colors with the highest priority at
the top and the lowest priority or no priority at the bottom. This list would be
used as the key in the sorted function below.
Below I am grouping the df with the Id and making a list of all the colors attached to the same ID. Then I am sorting the list based on the priority and choosing the first element from that list as it was asked in the question
"""
set_priority=['Yellow','Green' ,'Red']
result=df.groupby('ID')['Color'].apply(lambda x: sorted(list(x), key=lambda y: set_priority.index(y))[0]).reset_index()
结果
ID Color
0 1 Yellow
1 2 Yellow
2 3 Red
答案 1 :(得分:0)
使用分类时,可以指定自定义排序顺序。这没有给出问题中所需的关系,但是也许sort_dict
可以用来建模这样的事情。
import pandas as pd
colors = ["Green", "Red", "Yellow", "Yellow", "Red", "Green"]
df = pd.DataFrame({"Color":colors})
sort_dict = {"Yellow":-1, "Green":1, "Red":6}
df["colorcat"] = pd.Categorical(df['Color'], categories=sorted(sort_dict, key=sort_dict.get), ordered=True)
print(df.sort_values("colorcat"))
Color colorcat
2 Yellow Yellow
3 Yellow Yellow
0 Green Green
5 Green Green
1 Red Red
4 Red Red
将元组放在类别中似乎不起作用。
import pandas as pd
colors = ["Green", "Red", "Yellow", "Yellow", "Red", "Green"]
df = pd.DataFrame({"Color":colors})
df["colorcat"] = pd.Categorical(df['Color'], categories=[("Green", "Yellow"), "Red"], ordered=True)
print(df.sort_values("colorcat"))
Color colorcat
1 Red Red
4 Red Red
0 Green NaN
2 Yellow NaN
3 Yellow NaN
5 Green NaN
答案 2 :(得分:0)
import pandas as pd
# Create an example dataframe
data = {'ID': ['1' , '2', '1', '3', '1', '2'],
'Color': ['Red' , 'Yellow' , 'Yellow' , 'Red', 'Green', 'Red']}
df1 = pd.DataFrame(data)
a = df1.join(df1.groupby(['ID'])['Color'].apply(set).rename('m'),
on=['ID'])['m']
m1 = (a == set({'Green', 'Yellow', 'Red'})) | (a == set({'Green', 'Yellow'}))| (a == set({'Red', 'Yellow'}))
m2 = a == set({'Red'})
m4 = df1['Color'] == 'Yellow'
m5 = df1['Color'] == 'Red'
df1 = df1[(m1 & m4) | (m2 & m5) ]
print(df1)
Color ID
1 Yellow 2
2 Yellow 1
3 Red 3