在熊猫的列元素旁边添加数值

时间:2018-07-30 07:35:18

标签: python python-3.x pandas

这是我问here的问题的另一部分。因此,我决定将其作为另一个问题。

有没有办法让我可以在列matched_list_names中的每个匹配列表名称旁边添加相关性值。因此,相关性值公式为(number of matched words from list/total number of words in that list)*100,以便获得最相关的列表名称。因此,对于政治上的第一行,相关性为(1/3)*100=30%,即列表政治中总共3个单词中有1个单词被匹配对于运动,则为(1/3)*100=0.3,对于其他值,则为100-(sum of total value),即(100-(30+30)。因此,输出将类似于:-

    word_list                                          matched_list_names
['nuclear','election','usa','baseball']            politics 30,sports 30,miscellaneous 40
['football','united','thriller']                   sports 30,movies 30,miscellaneous 40               
['marvels','spiderman','hockey']                   movies 60,sports 30

....................                               .....................
....................                               .....................
....................                               ....................

1 个答案:

答案 0 :(得分:0)

使用:

movies=['spiderman','marvels','thriller']
sports=['baseball','hockey','football']
politics=['election','china','usa']
d = {'movies':movies, 'sports':sports, 'politics':politics}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}

def f(x):
    a = Counter([d1.get(y, 'miscellaneous') for y in x])
    return ', '.join(['{} {}'.format(k, v / sum(a.values())* 100 ) for k, v in a.items()])

df['matched_list_names'] = df['word_list'].apply(f)
print (df)
                            word_list  \
0  [nuclear, election, usa, baseball]   
1        [football, united, thriller]   
2     [marvels, hollywood, spiderman]   

                                  matched_list_names  
0     miscellaneous 25.0, politics 50.0, sports 25.0  
1  sports 33.33333333333333, miscellaneous 33.333...  
2  movies 66.66666666666666, miscellaneous 33.333...