Question

我有一个名为txtFreeForm的字符串列表：

['Add roth Sweep non vested money after 5 years of termination',
 'Add roth in-plan to the 401k plan.]

我需要检查一下是否只是添加＆＃39;存在于句子中。为此，我使用了这个

for each_line in txtFreeForm:
    match = re.search('add roth',each_line.lower())
    if match is not None:
        print(each_line)

但这显然会返回列表中的两个字符串，因为它们都包含＆＃39;添加roth＆＃39;。有没有办法专门搜索＆＃39;添加roth＆＃39;在一个句子中，因为我有一堆这些模式来搜索字符串。

感谢您的帮助！

Answer 1

您可以使用字符串的.Length属性来解决此问题吗？我不是一个经验丰富的Python程序员，但我认为它应该如何工作：

for each_line in txtFreeForm:
    match = re.search('add roth',each_line.lower())
    if (match is not None) and (len(txtFreeForm) == len("Add Roth")):
        print(each_line)

基本上，如果文本在字符串中，并且字符串的长度正好是字符串“Add Roth”的长度，那么它必须只包含“Add Roth”。

我希望这很有帮助。

修改

我误解了你的要求。您想要打印出包含“添加Roth”的句子，而不是包含“在计划中添加Roth”的句子。这是对的吗？

这段代码怎么样？

for each_line in txtFreeForm: match_AR = re.search('add roth',each_line.lower()) match_ARIP = re.search('add roth in plan',each_line.lower()) if (match_AR is True) and (match_ARIP is None): print(each_line)

这似乎应该解决问题。您可以通过搜索它们并将它们添加到比较中来排除任何字符串（例如“在计划中”）。

Answer 2

你很接近:)给这一点：

for each_line in txtFreeForm:
    match = re.search('add roth (?!in[-]plan)',each_line.lower())
    if match is not None:
        print(each_line[match.end():])

修改啊，我误读了......你有很多这些。这需要一些更具侵略性的魔法。

import re from functools import partial txtFreeForm = ['Add roth Sweep non vested money after 5 years of termination', 'Add roth in-plan to the 401k plan.'] def roths(rows): for row in rows: match = re.search('add roth\s*', row.lower()) if match: yield row, row[match.end():] def filter_pattern(pattern): return partial(lazy_filter_out, pattern) def lazy_filter(pattern): return partial(lazy_filter, pattern) def lazy_filter_out(pattern, rows): for row, rest in rows: if not re.match(pattern, rest): yield row, rest def magical_transducer(bad_words, nice_rows): magical_sentences = reduce(lambda x, y: y(x), [roths] + map(filter_pattern, bad_words), nice_rows) for row, _ in magical_sentences: yield row def main(): magic = magical_transducer(['in[-]plan'], txtFreeForm) print(list(magic)) if __name__ == '__main__': main()

为了解释一下发生了什么，你提到你有很多这样的词要处理。您可以比较两组项目的传统方式是使用嵌套的for循环。所以，

results = [] for word in words: for pattern in patterns: data = do_something(word_pattern) results.append(data) for item in data: for thing in item: and so on... and so fourth...

我正在使用一些不同的技术来尝试实现“更平坦”的实现并避免嵌套循环。我会尽力描述它们。

**Function compositions** # You will often see patterns that look like this: x = foo(a) y = bar(b) z = baz(y) # You may also see patterns that look like this: z = baz(bar(foo(a))) # an alternative way to do this is to use a functional composition # the technique works like this: z = reduce(lambda x, y: y(x), [foo, bar, baz], a)

匹配字符串中的唯一模式 - Python

2 个答案: