我有一个带有公司名称的数据框和一个字典,该字典将名称的所有变体映射到一个正式名称。
我想基于该词典创建一个具有正式名称的新列。是否有比遍历字典中的键值更简洁的方法?
df = pd.DataFrame({'name' : ['company a', 'company a inc', 'a electronics', 'company a ltd', 'the company a', 'b enterprises', 'company b']})
name_dict = {'company a' : ['company a', 'company a inc', 'a electronics', 'company a ltd', 'the company a'],
'company b' : ['b enterprises', 'company b']}
def get_company_name(name):
for k, v in name_dict.items():
if name in v:
return k
df['official_name'] = df.name.apply(get_company_name)
答案 0 :(得分:2)
我将创建转发字典并替换:
forward_names = {v:k for k, val in name_dict.items() for v in val }
df['official_name'] = df['name'].replace(forward_names)
答案 1 :(得分:0)
我只浏览name_dict
目录以建立数据框的行:
df = pd.DataFrame([[v,k] for k in name_dict for v in name_dict[k]],
columns = ['name', 'official_name'])
答案 2 :(得分:0)
解决方案1:
def get_company_name(name):
return [k for k, v in name_dict.items() if name in v][0]
df['official_name'] = df.name.apply(get_company_name)
print (df)
解决方案2:
df['official_name'] = df.name.apply(lambda name: list(k for k, v in name_dict.items() if name in v)[0])
print (df)
输出:
name official_name
0 company a company a
1 company a inc company a
2 a electronics company a
3 company a ltd company a
4 the company a company a
5 b enterprises company b
6 company b company b
答案 3 :(得分:0)
我会将name_dict放入数据框,然后融化然后合并:
df2 = pd.DataFrame.from_dict(name_dict, orient='index')
df2 = df2.transpose()
df2 = df2.melt()
df3 = df.merge(df2, how='left', left_on='name', right_on='value', sort=False)