Question

目标是用扩展替换文本中的SMS。我通过比较pandas中存储的列值并将其在python中读取为xlsx来实现此目的。

word    expansion
fyi     for your information
gtg     got to go
brb     be right back
gtg2    got to go too
fyii    sample test

到目前为止的努力：

提供者：

Replace words by checking from pandas dataframe

import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep

输出：

for your information got to go got to go2 for your informationi really

预期产出：

 for your information got to go got to go too sample text really

如何逐字检查？

Answer 1

我不知道它是否与您的要求完全匹配，但您可以尝试将单词边界（\ b）放在模式中每个单词的末尾，以便考虑整个单词：

import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile(r"\b|".join(rep.keys())+r"\b") # This line changes
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep

输出：

for your information got to go got to go too sample test really

短信语言文本扩展器 - 熊猫

1 个答案: