text = "One sentence with one (two) three, but mostly one. And twos."
期望的结果:A sentence with A (B) C, but mostly A. And twos.
应根据lookup_dict中的完全匹配替换单词。因此,不应替换 twos 中的两个,因为该单词中还有一个字母。然而,空格,逗号,paranthesis和句号旁边的单词应该被替换。
lookup_dict = {'var': ["one", "two", "three"]}
match_dict = {'var': ["A", "B", "C"]}
var_dict = {}
for i,v in enumerate(lookup_dict['var']):
var_dict[v] = match_dict['var'][i]
xpattern = re.compile('|'.join(var_dict.keys()))
result = xpattern.sub(lambda x: var_dict[x.group()], text.lower())
结果:A sentence with A (B) C, but mostly A. and Bs.
我是否可以在不添加词典+相邻字符的所有组合的情况下实现所需的输出?这似乎不必要地复杂化了:
lookup_dict = {'var':['one ', 'one,', '(one)', 'one.', 'two ', 'two,', '(two)', 'two.', 'three ', 'three,', '(three)' 'three.']
...
result = xpattern.sub(lambda x: var_dict[x.group()] if x.group() in lookup_dict['var'] else x.group(), text.lower())
答案 0 :(得分:4)
w = "Where are we one today two twos them"
lookup_dict = {"one":"1", "two":"2", "three":"3"}
pattern = re.compile(r'\b(' + '|'.join(lookup_dict.keys()) + r')\b')
output = pattern.sub(lambda x: lookup_dict[x.group()],w)
这将打印出来'我们今天在哪里2 2他们'
基本上,
我更新了你的字典,以便为每个条目使用密钥。
创建一个正则表达式,它基本匹配字典中的任何项目,使用正则表达式\ b(每个|键| in | your |字典)\ b来匹配项目a,b,c。并使用它周围的单词边界来匹配任何不属于单词的东西。即空间,插入符号等。
然后使用该模式,替换所有发生的匹配
答案 1 :(得分:0)
好的,终于完成了解决方案!这是非常冗长的,我不会让它照顾我的孩子,但无论如何它在这里。另一个答案可能是更好的解决方案:)
首先,有一种更好的方式来表示您想要替换的替换词:
lookup_dict = {"one": "A", "two": "B", "three": "C"}
看起来你真正想要的是匹配整个单词但忽略标点符号和大小写。为此,我们可以在每次尝试匹配时从字符串中去除标点符号,然后使用字母“A”而不是“one”等重新构造原始单词。
import re
text = "One sentence with one (two) three, but mostly one. And twos."
lookup_dict = {"one": "A", "two": "B", "three": "C"}
# Make a regex for only letters.
regex = re.compile('[^a-zA-Z]')
textSplit = text.split()
for i in range(0, len(textSplit)):
# Get rid of punctuation.
word = regex.sub('', textSplit[i]).lower()
if word in lookup_dict:
# Fetch the right letter from the lookup_dict.
letter = lookup_dict[word]
# Find where the word is in the punctuated string (super flakey I know).
wInd = textSplit[i].find(word)
# Just making sure the word needs to be reconstructed at all.
if wInd != -1:
# Rebuilding the string with punctuation.
newWord = textSplit[i][0:wInd] + letter + textSplit[i][wInd+len(word):]
textSplit[i] = newWord
print(" ".join(textSplit))
我知道这不是一个很好的解决方案但我已经完成了。把它当作一点乐趣所以请不要downvotes哈哈。