这是我的数据框:
Fruits Person Eat
Banana Peter Yes
Banana Ashley Yes
Strawberry Peter No
Strawberry Ashley Yes
Cherry Peter Yes
Orange Peter No
Orange Ashley No
Grape Ashley Yes
Pear Ashley Yes
Pear Peter Yes
我的数据框中有重复的水果。我需要根据以下逻辑删除重复项。如果有重复的水果并且Peter和Ashley都吃了它,则保留Peter的行,并删除Ashley的行。如果有重复的水果而Peter不吃而Ashley吃了,则删除Peter的行,并保留Ashley的行。如果有重复的水果而Peter不吃而Ashley不吃,则两行都将被删除。
采用这种逻辑,数据帧应输出为:
Fruits Person Eat
Banana Peter Yes
Strawberry Ashley Yes
Cherry Peter Yes
Grape Ashley Yes
Pear Peter Yes
我不确定如何在这些条件下遍历熊猫数据框以删除重复项。通常,对于第一个条件,我会执行以下操作:
data = [
{
"fruit": "Apple",
"person": "Ashley",
"eats": True
},
{
"fruit": "Apple",
"person": "Peter",
"eats": True
}
]
eats = dict()
for i, row in enumerate(data):
fruit = row["fruit"]
person = row["person"]
does_eat = row["eats"]
# mark whether person eats fruit
if not eats.get(person):
eats[person] = dict()
# if person does eat, record row number for later deletion if needed if does_eat:
eats[person][fruit] = i
# dedup
if person == "Peter" and eats.get("Peter") and eats["Peter"].get(fruit):
data.pop(eats["Ashley"][fruit])
elif person == "Ashley" and eats.get("Peter") and eats["Peter"].get(fruit):
data.pop(i)
任何有关如何使用数据框执行此操作的帮助/提示都将非常感谢。
答案 0 :(得分:0)
尝试一下:
df1 = (df[df.Eat.eq('Yes')].sort_values('Person')
.drop_duplicates(subset='Fruits', keep='last'))
Out[14]:
Fruits Person Eat
3 Strawberry Ashley Yes
7 Grape Ashley Yes
0 Banana Peter Yes
4 Cherry Peter Yes
9 Pear Peter Yes