我有一个文本文件,表明频率如此 “读1迪克1约翰1书1 阅读1个不同的1个1个不同的1 “ 我还为这些单词定义了字典dict = {'a':1,'book':2}
我想用字典值替换单词。谁能告诉我这是怎么做到的?
答案 0 :(得分:4)
text = # your text here
dictionary = # your dictionary here (don't call it dict!)
' '.join(str(dictionary.get(word, word)) for word in text.split(' '))
答案 1 :(得分:1)
这很简单:
text = # your text here
for word in dictionary:
text = text.replace(word, str(dictionary[word]))
修改强>
对于有关子串的问题,您可以使用正则表达式:
import re
text = # your text here
for word in dictionary:
text = re.sub('^|\s' + word + '\s|$', str(dictionary[word]) + ' ', text)
答案 2 :(得分:1)
import re
text = # your text here
dictionary = # your dictionary here (don't call it dict!)
re.sub("\\b.+?\\b", lambda x: str(dictionary.get(*[x.group()]*2)), text)
答案 3 :(得分:0)
您也可以使用re.sub
,但提供function as the replacement argument:
import re
frequencies = {'a': 1, 'book': 2}
input_string = "read 1 dick 1 john 1 book 1 read 1 different 1 a 1 different 1 "
def replace_if_found(m):
word = m.group(1)
return str(frequencies.get(word, word)) + m.group(2)
print re.sub(r'(\w+)( \d+)', replace_if_found, input_string)
...它为您提供输出:
read 1 dick 1 john 1 2 1 read 1 different 1 1 1 different 1
这样做的好处是它只能替换你有一个或多个单词字符后跟一个或多个数字的位置。