两列唯一字符串

时间:2019-05-07 11:03:03

标签: python pandas dataframe pandas-groupby

我想找到person1person2列的唯一组合,尽管数据框中的值是相反的。在下面,您可以找到初始的Dataframe示例,在这里我想找到唯一的人:

df = pd.DataFrame({"person1":["AL","IN","AN","DL","IN","AL","AL","IN","AN"],
                   "person2":["AL","AN", np.nan,"AL","AN","AL","DL","IN","IN"]})

  person1  person2
0     AL      AL
1     IN      AN
2     AN      NAN
3     DL      AL
4     IN      AN
5     AL      AL
6     AL      DL
7     IN      IN
8     AN      IN

我想要的输出如下:

  person1  person2  person
0     AL      AL     AL
1     IN      AN    IN/AN
2     AN      NAN    AN
3     DL      AL    DL/AL
4     IN      AN    IN/AN
5     AL      AL     AL
6     AL      DL    DL/AL  # Since it has been added as DL/AL NOT AL/DL
7     IN      IN     IN
8     AN      IN    IN/AN  # Since it has been added as IN/AN NOT AN/IN

我使用了以下代码:

df['person'] = np.where(df.person1 != df.person2,
                                     df.person1 + "/" + df.person2, df.person1)

但是在上面的示例中,它在索引6和8中分别返回AL/DLAN/IN。与往常一样,当我找不到合适的方法时,可以在其中获得DL/ALIN/AN

的唯一顺序

熊猫大师,请告诉我:)

2 个答案:

答案 0 :(得分:0)

如果可能的话,对两列进行排序:

df1 = pd.DataFrame(np.sort(df[['person1','person2']].fillna('')), 
                   index=df.index,
                   columns=['person1','person2'])
df['person'] = np.where(df1.person1 != df1.person2,
                        df1.person1.str.cat(df1.person2,  sep="/").str.strip('/'),
                        df1.person1)
print (df)
  person1 person2 person
0      AL      AL     AL
1      IN      AN  AN/IN
2      AN     NaN     AN
3      DL      AL  AL/DL
4      IN      AN  AN/IN
5      AL      AL     AL
6      AL      DL  AL/DL
7      IN      IN     IN
8      AN      IN  AN/IN

答案 1 :(得分:0)

您可以使用方法apply()

df['person']=df.apply(lambda r: r.drop_duplicates().sort_values().str.cat(sep='/'), axis=1)

print(df)

输出:

  person1 person2 person
0      AL      AL     AL
1      IN      AN  AN/IN
2      AN     NaN     AN
3      DL      AL  AL/DL
4      IN      AN  AN/IN
5      AL      AL     AL
6      AL      DL  AL/DL
7      IN      IN     IN
8      AN      IN  AN/IN