目标是用扩展替换文本中的SMS。我通过比较pandas中存储的列值并将其在python中读取为xlsx来实现此目的。
word expansion
fyi for your information
gtg got to go
brb be right back
gtg2 got to go too
fyii sample test
到目前为止的努力:
提供者:
Replace words by checking from pandas dataframe
import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep
输出:
for your information got to go got to go2 for your informationi really
预期产出:
for your information got to go got to go too sample text really
如何逐字检查?
答案 0 :(得分:1)
我不知道它是否与您的要求完全匹配,但您可以尝试将单词边界(\ b)放在模式中每个单词的末尾,以便考虑整个单词:
import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile(r"\b|".join(rep.keys())+r"\b") # This line changes
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep
输出:
for your information got to go got to go too sample test really