我有一长串的字符串,其中包含按给出顺序排列的感兴趣的子字符串,但这是一个在文本文件中使用句子的小示例:
This is a long drawn out sentence needed to emphasize a topic I am trying to learn.
It is new idea for me and I need your help with it please!
Thank you so much in advance, I really appreciate it.
我想从此文本文件中找到同时包含"I"
和"need"
的所有句子,但是它们必须按该顺序出现。
因此,在此示例中,'I'
和'need'
都出现在句子1和句子2中,但是在句子1中它们的顺序错误,所以我不想返回它。我只想返回第二句话,因为它的顺序是'I need'
。
我已经使用此示例来标识子字符串,但是我无法弄清楚如何仅按顺序查找它们:
id1 = "I"
id2 = "need"
with open('fun.txt') as f:
for line in f:
if id1 and id2 in line:
print(line[:-1])
这将返回:
This is a long drawn out sentence needed to emphasize a topic I am trying to learn.
It is new idea for me and I need your help with it please!
但我只想要:
It is new idea for me and I need your help with it please!
谢谢!
答案 0 :(得分:1)
您需要在行{em>之后的部分{em} {em> id2
中标识id1
:
infile = [
"This is a long drawn out sentence needed to emphasize a topic I am trying to learn.",
"It is new idea for me and I need your help with it please!",
"Thank you so much in advance, I really appreciate it.",
]
id1 = "I"
id2 = "need"
for line in infile:
if id1 in line:
pos1 = line.index(id1)
if id2 in line[pos1+len(id1) :] :
print(line)
输出:
It is new idea for me and I need your help with it please!
答案 1 :(得分:1)
您可以使用正则表达式进行检查。一种可能的解决方案是:
id1 = "I"
id2 = "need"
regex = re.compile(r'^.*{}.*{}.*$'.format(id1, id2))
with open('fun.txt') as f:
for line in f:
if re.search(regex, line):
print(line[:-1])
答案 2 :(得分:0)
只需
import re
match = re.match('pattern','yourString' )
https://developers.google.com/edu/python/regular-expressions
所以您要寻找的模式是'I(。*)need' Regex Match all characters between two strings 您可能必须以不同的方式构建模式 因为我不知道是否有例外。如果是这样,您可以运行regex两次以获取原始字符串的子集,然后再次运行以获取所需的完全匹配项
答案 3 :(得分:0)
您可以定义一个函数来计算两个sets
(每个句子和I need
)的交集,并使用sorted
和key
来对结果的出现顺序与句子中的出现顺序相同。这样,您可以检查结果列表的顺序是否与I need
中的顺序匹配:
a = ['I','need']
l = ['This is a long drawn out sentence needed to emphasize a topic I am trying to learn.',
'It is new idea for me and I need your help with it please!',
'Thank you so much in advance, I really appreciate it.']
自定义函数。如果字符串以相同顺序包含,则返回True
:
def same_order(l1, l2):
inters = sorted(set(l1) & set(l2.split(' ')), key = l2.split(' ').index)
return True if inters == l1 else False
如果返回了l
,则返回列表True
中的给定字符串:
[l[i] for i, j in enumerate(l) if same_order(a, j)]
#['It is new idea for me and I need your help with it please!']