Question

text = "One sentence with one (two) three, but mostly one. And twos."

期望的结果：A sentence with A (B) C, but mostly A. And twos.

应根据lookup_dict中的完全匹配替换单词。因此，不应替换 twos 中的两个，因为该单词中还有一个字母。然而，空格，逗号，paranthesis和句号旁边的单词应该被替换。

lookup_dict = {'var': ["one", "two", "three"]}
match_dict = {'var': ["A", "B", "C"]}

var_dict = {}

for i,v in enumerate(lookup_dict['var']):
    var_dict[v] = match_dict['var'][i]
    xpattern = re.compile('|'.join(var_dict.keys()))
    result = xpattern.sub(lambda x: var_dict[x.group()], text.lower())

结果：A sentence with A (B) C, but mostly A. and Bs.

我是否可以在不添加词典+相邻字符的所有组合的情况下实现所需的输出？这似乎不必要地复杂化了：

lookup_dict = {'var':['one ', 'one,', '(one)', 'one.', 'two ', 'two,', '(two)', 'two.', 'three ', 'three,', '(three)' 'three.']
...
result = xpattern.sub(lambda x: var_dict[x.group()] if x.group() in lookup_dict['var'] else x.group(), text.lower())

Answer 1

w = "Where are we one today two twos them"
lookup_dict = {"one":"1", "two":"2", "three":"3"}
pattern = re.compile(r'\b(' + '|'.join(lookup_dict.keys()) + r')\b')
output = pattern.sub(lambda x: lookup_dict[x.group()],w)

这将打印出来＆＃39;我们今天在哪里2 2他们＆＃39;

基本上，

我更新了你的字典，以便为每个条目使用密钥。

创建一个正则表达式，它基本匹配字典中的任何项目，使用正则表达式\ b（每个|键| in | your |字典）\ b来匹配项目a，b，c。并使用它周围的单词边界来匹配任何不属于单词的东西。即空间，插入符号等。

然后使用该模式，替换所有发生的匹配

Answer 2

好的，终于完成了解决方案！这是非常冗长的，我不会让它照顾我的孩子，但无论如何它在这里。另一个答案可能是更好的解决方案：）

首先，有一种更好的方式来表示您想要替换的替换词：

lookup_dict = {"one": "A", "two": "B", "three": "C"}

看起来你真正想要的是匹配整个单词但忽略标点符号和大小写。为此，我们可以在每次尝试匹配时从字符串中去除标点符号，然后使用字母“A”而不是“one”等重新构造原始单词。

import re

text = "One sentence with one (two) three, but mostly one. And twos."

lookup_dict = {"one": "A", "two": "B", "three": "C"}

# Make a regex for only letters.
regex = re.compile('[^a-zA-Z]')

textSplit = text.split()

for i in range(0, len(textSplit)):
    # Get rid of punctuation.
    word = regex.sub('', textSplit[i]).lower()
    if word in lookup_dict:
        # Fetch the right letter from the lookup_dict.
        letter = lookup_dict[word]
        # Find where the word is in the punctuated string (super flakey I know).
        wInd = textSplit[i].find(word)
        # Just making sure the word needs to be reconstructed at all.
        if wInd != -1:
            # Rebuilding the string with punctuation.
            newWord = textSplit[i][0:wInd] + letter + textSplit[i][wInd+len(word):]
            textSplit[i] = newWord

print(" ".join(textSplit))

我知道这不是一个很好的解决方案但我已经完成了。把它当作一点乐趣所以请不要downvotes哈哈。

在字典中用完全匹配替换字符串中的单词

2 个答案: