我是熊猫的新手,我在根据其他列中的值从DF中选择值时遇到了问题。 以下是我的数据框:
C1 C2 C3 C4
0 1234 1002 Operational ABC
1 5678 2001 Closed ABC
2 7896 1002 Operational DEF
3 4321 4005 Closed CDE
4 7781 4005 Operational ABC
Q1。我想在C2中选择重复值,其C3值为'操作'。 输出数据帧应为
C1 C2 C3 C4
1234 1002 Operational ABC
7896 1002 Operational ABC
我已尝试过df [(df.duplicated([' C2'],保持=假))& df [' C3'] =='操作']但是它选择索引为0,2,4的行。但我只想要索引为0和2的行作为输出。
Q2。如何为C4中的每个不同值选择一行。 输出数据帧应为
C1 C2 C3 C4
7896 1002 Operational DEF
4321 4005 Closed CDE
7781 4005 Operational ABC
我是否可以获得有关这两个自定义选项的任何建议。
答案 0 :(得分:1)
对于第一个解决方案更容易过滤两次 - 首先只有Operational
然后duplicates
:
df1 = df[df['C3'] == 'Operational']
df1 = df1[df1.duplicated(['C2'], keep=False)]
print (df1)
C1 C2 C3 C4
0 1234 1002 Operational ABC
2 7896 1002 Operational DEF
相同但可读性更差:
m1 = df['C3'] == 'Operational'
df1 = df[df[m1].duplicated(['C2'], keep=False) & m1]
print (df1)
C1 C2 C3 C4
0 1234 1002 Operational ABC
2 7896 1002 Operational DEF
对于第二个问题,请使用带有参数keep='last'
的{{3}}获取C4
中重复项的最后一个值:
df2 = df.drop_duplicates('C4', keep='last')
print (df2)
C1 C2 C3 C4
2 7896 1002 Operational DEF
3 4321 4005 Closed CDE
4 7781 4005 Operational ABC
或者如果是必要的话只删除连续的骗局:
mask = df['C4'].ne(df['C4'].shift()).cumsum().duplicated(keep=False)
df2 = df[~mask]
print (df2)
C1 C2 C3 C4
2 7896 1002 Operational DEF
3 4321 4005 Closed CDE
4 7781 4005 Operational ABC