Question

我的数据框如下：

>>> df
     ID   first    last
0   123     Joe  Thomas
1   456   James   Jonas
2   675   James   Jonas
3   457   James  Thomas
4   676  Joseph  Thomas
5   678    Joey  Thomas
6   670     Jim   Jonas
7   671    Katy   Perry

然后我有一本字典，里面有键＃＆＃34;昵称＆＃34;和值列表作为具有该特定昵称的所有名称，如下所示：

nicknames =  {'KATY': ['KATHERINE', 'KATHLEEN'], 'CHET': ['CHESTER'], 'PENNY': ['PENELOPE'], 'PAT': ['PATRICIA', 'PATRICK'], 'BART': ['BARTHOLOMEW'], 'BELLE': ['ARABELLA', 'BELINDA', 'ISABEL', 'ISABELLE', 'ROSABEL'], 'JOE': ['JOSEPH', 'JOSHUA'], 'JOEY': ['JOSEPH', 'JOSOPHINE'], 'JIM': ['JAMES']}

从数据框中，我想检查所有具有昵称的行，并且对于它们，在另一行中存在正确的名称。并得到输出：

output = [[123, 678], [670]]

我该怎么做？谢谢！

解答：

    final1={}
    final=[]
    tuplist = zip(df['ID'], df['first'], df['last'])
    for i in range(len(tuplist)):
        if tuplist[i][1].upper() in nicknames.keys():
            val_list = nicknames.get(tuplist[i][1].upper())
            for item in val_list:
                l1 = [j[1].upper() for j in tuplist]
                l2 = [j[2] for j in tuplist if j[1].upper() == item]
                if item in l1 and tuplist[i][2] in l2: 
                    final.append((tuplist[i][0], item))
                    break
    #print final

    c = Counter([y[1] for y in final])
    for t in final:
        final1[t[0]] = c.get(t[1])   
    return final1

如何使用字典项检查数据框的列值

0 个答案: