仅删除字符串中最后一次出现的单词

时间:2018-02-26 15:51:26

标签: python string

我有一个字符串和一系列短语。

input_string = 'alice is a character from a fairy tale that lived in a wonder land. A character about whome no one knows much about'

phrases_to_remove = ['wonderland', 'character', 'no one']

现在我要做的是,从phrases_to_remove删除数组input_string中最后出现的单词。

output_string = 'alice is a character from a fairy tale that lived in a. A about whome knows much about'

我已经写下了一个方法,该方法接受输入字符串,array或仅string替换,并使用rsplit()替换短语。

def remove_words_from_end(actual_string: str, to_replace, occurrence: int):
    if isinstance(to_replace, list):
        output_string = actual_string
        for string in to_replace:
            output_string = ' '.join(output_string.rsplit(string, maxsplit=occurrence))
        return output_string.strip()
    elif isinstance(to_replace, str):
        return ' '.join(actual_string.rsplit(to_replace, maxsplit=occurrence)).strip()
    else:
        raise TypeError('the value "to_replace" must be a string or a list of strings')

代码的问题是,我无法删除space不匹配的字词。例如wonder landwonderland

我是否有办法在不影响性能的情况下做到这一点?

2 个答案:

答案 0 :(得分:3)

使用re处理可能的空格是可能的:

import re

def remove_last(word, string):
    pattern = ' ?'.join(list(word))
    matches = list(re.finditer(pattern, string))
    if not matches:
        return string
    last_m = matches[-1]
    sub_string = string[:last_m.start()]
    if last_m.end() < len(string):
        sub_string += string[last_m.end():]
    return sub_string

def remove_words_from_end(words, string):
    words_whole = [word.replace(' ', '') for word in words]
    string_out = string
    for word in words:
        string_out = remove_last(word, string_out)
    return string_out

进行一些测试:

>>> input_string = 'alice is a character from a fairy tale that lived in a wonder land. A character about whome no one knows much about'
>>> phrases_to_remove = ['wonderland', 'character', 'no one']
>>> remove_words_from_end(phrases_to_remove, input_string)
'alice is a character from a fairy tale that lived in a . A  about whome  knows much about'
>>> phrases_to_remove = ['wonder land', 'character', 'noone']
>>> remove_words_from_end(phrases_to_remove, input_string)
'alice is a character from a fairy tale that lived in a . A  about whome  knows much about'

在此示例中,正则表达式搜索模式只是每个字符之间可能有空格' ?'的单词。

答案 1 :(得分:0)

通常,当比较两个字符串s1和s2时,你可以检查它们是否相等(相同的大小和每个字符是相同的 - 标准方法使用的是什么)或者(你需要实现的部分)如果它们不同1个大小和它们在空格上的区别。执行此操作的示例函数如下所示。在性能方面,这是一个O(n)检查,其中n是字符串的长度,但无论初始检查是否也是O(n)。

def almost_match(s1, s2):
  # If they have a single space of difference
  if len(s1) != len(s2) + 1 and len(s2) != len(s1) + 1:
    return False
  i = 0 # counter for s1 characters
  j = 0 # counter for s2 characters

  while i < len(s1) and j < len(s2):
    if s1[i] != s2[j]:
      if s1 == ' ':
        i = i + 1
        continue
      elif s2 == ' ':
        j = j + 1
        continue
      else:
        return False
    i = i + 1
    j = j + 1

  if j < len(s2) and s2[j] == ' ':
    j = j + 1

  if i < len(s1) and s2[i] == ' ':
    i = i + 1

  return i == len(s1) and j == len(s2) # require that both strings matched fully

对于最后一行,请注意它可以防止将“abc”与“abcd”匹配的可能性 这可以优化,但这是一般的想法