我试图遍历数据框列以提取某组单词。我将它们映射为字典中的键值对,并在目前为止设置了每行键的帮助。
现在,我想要做的是,如果字符串中存在值,则返回同一行中的多个键,这些键应由|
(管道)分隔。
代码:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Red and Blue Lace Midi Dress', 'Long Armed Sweater Azure and Ruby',
'High Top Ruby Sneakers', 'Tight Indigo Jeans',
'T-Shirt Navy and Rose']})
colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}
def fetchColours(x):
for key, values in colour.items():
for value in values:
if value in x.lower():
return key
else:
return np.nan
df['Colour'] = df['Name'].apply(fetchColours)
输出:
Name Colour
0 Red and Blue Lace Midi Dress red
1 Long Armed Sweater Azure and Ruby blue
2 High Top Ruby Sneakers red
3 Tight Indigo Jeans blue
4 T-Shirt Navy and Rose blue
预期结果:
Name Colour
0 Red and Blue Lace Midi Dress red
1 Long Armed Sweater Azure and Ruby blue|red
2 High Top Ruby Sneakers red
3 Tight Indigo Jeans blue
4 T-Shirt Navy and Rose blue|red
答案 0 :(得分:2)
问题是您在找到密钥后直接返回,而您应该继续搜索,直到找到所有结果:
def fetchColours(x):
keys = []
for key, values in colour.items():
for value in values:
if value in x.lower():
keys.append(key)
if len(keys) != 0:
return '|'.join(keys)
else:
return np.nan
为此,你必须改变:
colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}
到
colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'blue','indigo', 'navy')}
因为否则它不会搜索“蓝色”字样。在每个句子中,意味着它不能将此键添加到第一个示例中的列表中。
答案 1 :(得分:0)
这个怎么样:
def fetchColors(x):
color_keys = []
for key, values in color.items():
for value in values:
if value in x.lower():
color_keys.append(key)
if color_keys:
return '|'.join(color_keys)
else:
return np.nan