我在数据框中有两列具有重叠值。如何查找第二列中存在的第一列的重复值,并在新列中返回第二列中相应的行号。
答案 0 :(得分:0)
import pandas as pd
import csv
from pandas.compat import StringIO
print(pd.__version__)
csvdata = StringIO("""a,b
111,122
122,3
111,9
254,395
265,245
111,395
220,111
395,305
395,8""")
df1 = pd.read_csv(csvdata, sep=",")
# find unique duplicate values in first column
col_a_dups = df1['a'][df1['a'].duplicated()].unique()
corresponding_value = df1['b'][df1['b'].isin(col_a_dups)]
print(df1.join(corresponding_value, lsuffix="_l", rsuffix="_r"))
#print(corresponding_value.index)
生产
0.24.2
a b_l b_r
0 111 122 NaN
1 122 3 NaN
2 111 9 NaN
3 254 395 395.0
4 265 245 NaN
5 111 395 395.0
6 220 111 111.0
7 395 305 NaN
8 395 8 NaN