我有Word及其最接近的相关单词的词典。
我想用原始单词替换字符串中的相关单词。 目前我能够替换字符串中只有每个键值的字,我无法替换字符串,因为Key具有多个值。 怎么做呢
示例输入
North Indian Restaurant
South India Hotel
Mexican Restrant
Italian Hotpot
Cafe Bar
Irish Pub
Maggiee Baar
Jacky Craft Beer
Bristo 1889
Bristo 188
Bristo 188.
如何制作字典
y= list(word)
words = y
similar = [[item[0] for item in model.wv.most_similar(word) if item[1] > 0.7] for word in words]
similarity_matrix = pd.DataFrame({'Orginal_Word': words, 'Related_Words': similar})
similarity_matrix = similarity_matrix[['Orginal_Word', 'Related_Words']]
在数据框内有2列,列表为
Orginal_Word Related_Words
[Indian] [India,Ind,ind.]
[Restaurant] [Hotel,Restrant,Hotpot]
[Pub] [Bar,Baar, Beer]
[1888] [188, 188., 18]
词典
similarity_matrix.set_index('Orginal_Word')['Related_Words'].to_dict()
{'Indian ': 'India, Ind, ind.',
'Restaurant': 'Hotel, Restrant, Hotpot',
'Pub': 'Bar, Baar, Beer'
'1888': '188, 188., 18'}
预期输出
North Indian Restaurant
South India Restaurant
Mexican Restaurant
Italian Restaurant
Cafe Pub
Irish Pub
Maggiee Pub
Jacky Craft Pub
Bristo 1888
Bristo 1888
Bristo 1888
感谢任何帮助
答案 0 :(得分:2)
我认为您可以通过replace
regex
的{{1}}新词典answer {/ 3}}
d = {'Indian': 'India, Ind, ind.',
'Restaurant': 'Hotel, Restrant, Hotpot',
'Pub': 'Bar, Baar, Beer',
'1888': '188, 188., 18'}
d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df['col'] = df['col'].replace(d1, regex=True)
print (df)
col
0 North Indian Restaurant
1 South Indian Restaurant
2 Mexican Restaurant
3 Italian Restaurant
4 Cafe Pub
5 Irish Pub
6 Maggiee Pub
7 Jacky Craft Pub
8 Bristo 1888
9 Bristo 1888
10 Bristo 1888
EDIT(上述代码的功能):
def replace_words(d, col):
d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')
EDIT1:
如果出现以下错误:
正则表达式错误 - 缺失),位置7的未终止子模式
是键中必需的转义正则表达式值:
import re
def replace_words(d, col):
d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')