从索引到熊猫数据框中的字段名称

时间:2020-06-25 23:25:42

标签: python pandas fuzzywuzzy

我需要从索引中获取值名称。 我的数据集如下

try_test = pd.DataFrame({'word': ['apple', 'orange', 'diet', 'energy', 'fire', 'cake'], 
                         'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})

    word    name
0   apple   dog
1   orange  cat
2   diet    mad cat
3   energy  good dog
4   fire    bad dog
5   cake    chicken

使用此功能:

def func(name):
    matches = try_test.apply(lambda row: (fuzz.partial_ratio(row['name'], name) >= 85), axis=1)
    return [i for i, x in enumerate(matches) if x]

try_test.apply(lambda row: func(row['name']), axis=1)

我得到以下值:

0    [0, 3, 4]
1       [1, 2]
2       [1, 2]
3       [0, 3]
4       [0, 4]
5          [5]

我想用单词字段代替索引。

预期输出:

0    [apple, energy, fire]
1       [orange, diet]
2       [orange, diet]
3       [apple, energy]
4       [apple, fire]
5          [cake]

任何建议将不胜感激。

2 个答案:

答案 0 :(得分:0)

使用索引获取df之后,只需再次索引df就可以解决您的问题。这样您就可以在函子外或函子内以及IMO中进行操作;

In [2]: import pandas as pd                                                                                                                                                                                                                                 

In [3]: try_test = pd.DataFrame({'word': ['apple', 'orange', 'diet', 'energy', 'fire', 'cake'],  
   ...:                          'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})                                                                                                                                                      

In [4]: try_test                                                                                                                                                                                                                                            
Out[4]: 
     word      name
0   apple       dog
1  orange       cat
2    diet   mad cat
3  energy  good dog
4    fire   bad dog
5    cake   chicken

In [5]: rows = [0,3,4]                                                                                                                                                                                                                                      

In [6]: try_test.loc[rows, 'word']                                                                                                                                                                                                                          
Out[6]: 
0     apple
3    energy
4      fire
Name: word, dtype: object

In [7]: try_test.loc[rows, 'word'].values.tolist()                                                                                                                                                                                                                  
['apple', 'energy', 'fire']

答案 1 :(得分:0)

将功能从i更改为try_test.word[i]

def func(name):
    matches = try_test.apply(lambda row: (fuzz.partial_ratio(row['name'], name) >= 85), axis=1)
    return [try_test.word[i] for i, x in enumerate(matches) if x]