我有一个名为txtFreeForm的字符串列表:
['Add roth Sweep non vested money after 5 years of termination',
'Add roth in-plan to the 401k plan.]
我需要检查一下是否只是添加'存在于句子中。为此,我使用了这个
for each_line in txtFreeForm:
match = re.search('add roth',each_line.lower())
if match is not None:
print(each_line)
但这显然会返回列表中的两个字符串,因为它们都包含'添加roth'。有没有办法专门搜索'添加roth'在一个句子中,因为我有一堆这些模式来搜索字符串。
感谢您的帮助!
答案 0 :(得分:0)
您可以使用字符串的.Length属性来解决此问题吗?我不是一个经验丰富的Python程序员,但我认为它应该如何工作:
for each_line in txtFreeForm:
match = re.search('add roth',each_line.lower())
if (match is not None) and (len(txtFreeForm) == len("Add Roth")):
print(each_line)
基本上,如果文本在字符串中,并且字符串的长度正好是字符串“Add Roth”的长度,那么它必须只包含“Add Roth”。
我希望这很有帮助。
修改强>
我误解了你的要求。您想要打印出包含“添加Roth”的句子,而不是包含“在计划中添加Roth”的句子。这是对的吗?
这段代码怎么样?
for each_line in txtFreeForm:
match_AR = re.search('add roth',each_line.lower())
match_ARIP = re.search('add roth in plan',each_line.lower())
if (match_AR is True) and (match_ARIP is None):
print(each_line)
这似乎应该解决问题。您可以通过搜索它们并将它们添加到比较中来排除任何字符串(例如“在计划中”)。
答案 1 :(得分:0)
你很接近:)给这一点:
for each_line in txtFreeForm:
match = re.search('add roth (?!in[-]plan)',each_line.lower())
if match is not None:
print(each_line[match.end():])
修改强> 啊,我误读了......你有很多这些。这需要一些更具侵略性的魔法。
import re
from functools import partial
txtFreeForm = ['Add roth Sweep non vested money after 5 years of termination',
'Add roth in-plan to the 401k plan.']
def roths(rows):
for row in rows:
match = re.search('add roth\s*', row.lower())
if match:
yield row, row[match.end():]
def filter_pattern(pattern):
return partial(lazy_filter_out, pattern)
def lazy_filter(pattern):
return partial(lazy_filter, pattern)
def lazy_filter_out(pattern, rows):
for row, rest in rows:
if not re.match(pattern, rest):
yield row, rest
def magical_transducer(bad_words, nice_rows):
magical_sentences = reduce(lambda x, y: y(x), [roths] + map(filter_pattern, bad_words), nice_rows)
for row, _ in magical_sentences:
yield row
def main():
magic = magical_transducer(['in[-]plan'], txtFreeForm)
print(list(magic))
if __name__ == '__main__':
main()
为了解释一下发生了什么,你提到你有很多这样的词要处理。您可以比较两组项目的传统方式是使用嵌套的for循环。所以,
results = []
for word in words:
for pattern in patterns:
data = do_something(word_pattern)
results.append(data)
for item in data:
for thing in item:
and so on...
and so fourth...
我正在使用一些不同的技术来尝试实现“更平坦”的实现并避免嵌套循环。我会尽力描述它们。
**Function compositions**
# You will often see patterns that look like this:
x = foo(a)
y = bar(b)
z = baz(y)
# You may also see patterns that look like this:
z = baz(bar(foo(a)))
# an alternative way to do this is to use a functional composition
# the technique works like this:
z = reduce(lambda x, y: y(x), [foo, bar, baz], a)