我的数据框如下:
>>> df
ID first last
0 123 Joe Thomas
1 456 James Jonas
2 675 James Jonas
3 457 James Thomas
4 676 Joseph Thomas
5 678 Joey Thomas
6 670 Jim Jonas
7 671 Katy Perry
然后我有一本字典,里面有键#"昵称"和值列表作为具有该特定昵称的所有名称,如下所示:
nicknames = {'KATY': ['KATHERINE', 'KATHLEEN'], 'CHET': ['CHESTER'], 'PENNY': ['PENELOPE'], 'PAT': ['PATRICIA', 'PATRICK'], 'BART': ['BARTHOLOMEW'], 'BELLE': ['ARABELLA', 'BELINDA', 'ISABEL', 'ISABELLE', 'ROSABEL'], 'JOE': ['JOSEPH', 'JOSHUA'], 'JOEY': ['JOSEPH', 'JOSOPHINE'], 'JIM': ['JAMES']}
从数据框中,我想检查所有具有昵称的行,并且对于它们,在另一行中存在正确的名称。并得到输出:
output = [[123, 678], [670]]
我该怎么做?谢谢!
解答:
final1={}
final=[]
tuplist = zip(df['ID'], df['first'], df['last'])
for i in range(len(tuplist)):
if tuplist[i][1].upper() in nicknames.keys():
val_list = nicknames.get(tuplist[i][1].upper())
for item in val_list:
l1 = [j[1].upper() for j in tuplist]
l2 = [j[2] for j in tuplist if j[1].upper() == item]
if item in l1 and tuplist[i][2] in l2:
final.append((tuplist[i][0], item))
break
#print final
c = Counter([y[1] for y in final])
for t in final:
final1[t[0]] = c.get(t[1])
return final1