我想找到person1
和person2
列的唯一组合,尽管数据框中的值是相反的。在下面,您可以找到初始的Dataframe示例,在这里我想找到唯一的人:
df = pd.DataFrame({"person1":["AL","IN","AN","DL","IN","AL","AL","IN","AN"],
"person2":["AL","AN", np.nan,"AL","AN","AL","DL","IN","IN"]})
person1 person2
0 AL AL
1 IN AN
2 AN NAN
3 DL AL
4 IN AN
5 AL AL
6 AL DL
7 IN IN
8 AN IN
我想要的输出如下:
person1 person2 person
0 AL AL AL
1 IN AN IN/AN
2 AN NAN AN
3 DL AL DL/AL
4 IN AN IN/AN
5 AL AL AL
6 AL DL DL/AL # Since it has been added as DL/AL NOT AL/DL
7 IN IN IN
8 AN IN IN/AN # Since it has been added as IN/AN NOT AN/IN
我使用了以下代码:
df['person'] = np.where(df.person1 != df.person2,
df.person1 + "/" + df.person2, df.person1)
但是在上面的示例中,它在索引6和8中分别返回AL/DL
和AN/IN
。与往常一样,当我找不到合适的方法时,可以在其中获得DL/AL
和IN/AN
熊猫大师,请告诉我:)
答案 0 :(得分:0)
如果可能的话,对两列进行排序:
df1 = pd.DataFrame(np.sort(df[['person1','person2']].fillna('')),
index=df.index,
columns=['person1','person2'])
df['person'] = np.where(df1.person1 != df1.person2,
df1.person1.str.cat(df1.person2, sep="/").str.strip('/'),
df1.person1)
print (df)
person1 person2 person
0 AL AL AL
1 IN AN AN/IN
2 AN NaN AN
3 DL AL AL/DL
4 IN AN AN/IN
5 AL AL AL
6 AL DL AL/DL
7 IN IN IN
8 AN IN AN/IN
答案 1 :(得分:0)
您可以使用方法apply()
:
df['person']=df.apply(lambda r: r.drop_duplicates().sort_values().str.cat(sep='/'), axis=1)
print(df)
输出:
person1 person2 person
0 AL AL AL
1 IN AN AN/IN
2 AN NaN AN
3 DL AL AL/DL
4 IN AN AN/IN
5 AL AL AL
6 AL DL AL/DL
7 IN IN IN
8 AN IN AN/IN