我正在尝试基于两个单独的列表提取项目列表之间的文本。
For example
start = ['intro','Intro','[intro','Introduction',(intro)]
end = ['P1','P2','[P1','[P2']
input:
intro
L1
L2
P1
L3
L4
[intro]
L5
L6
Expected Output:
L1
L2
L5
L6
尝试过后,我该如何实现
text = 'I want to find a string between two substrings'
start = 'find a '
end = 'between two'
print(text[text.index(start)+len(start):text.index(end)])
我想要基于示例1的输出
答案 0 :(得分:2)
基于第二个示例的快速而肮脏的示例:
text = 'I want to find a string between two substrings'
start = 'find a '
end = 'substrings'
s_idx = text.index(start) + len(start) if start in text else -1
e_idx = text.index(end) if end in text else -1
if s_idx > -1 and e_idx > -1:
print(text[s_idx:e_idx])
您必须检查子字符串是否是字符串的一部分,否则str.index()
会抛出ValueError
。
编辑:基于第一个示例的输出:
start_list = ["work", "start", "also"]
end_list = ["of", "end", "substrings"]
text = "This can also work on a list of start and end substrings"
print("* Example with a list of start and end strings, stops on a first match")
print("- Text: {0}".format(text))
print("- Start: {0}".format(start_list))
print("- End: {0}".format(end_list))
s_idx = -1
for string in start_list:
if string in text:
s_idx = text.index(string) + len(string)
# we're breaking on a first find.
break
e_idx = -1
for string in end_list:
if string in text:
e_idx = text.index(string)
# we're breaking on a first find.
break
if e_idx > -1 and s_idx > -1:
print(text[s_idx:e_idx])
或者,如果您甚至想走得更远,找到所有出现之间的所有子串:
print("* Example with a list of start and end strings, finds all matches")
print("- Text: {0}".format(text))
print("- Start: {0}".format(start_list))
print("- End: {0}".format(end_list))
s_idxs = []
e_idxs = []
for string in start_list:
if string in text:
s_idxs.append(text.index(string) + len(string))
for string in end_list:
if string in text:
e_idxs.append(text.index(string))
for s_idx in s_idxs:
for e_idx in e_idxs:
if e_idx <= s_idx:
print("ignoring end index {0}, it's before our start at {1}!".format(e_idx, s_idx))
# end index is lower than start index, ignoring it.
continue
print("{0}:{1} => {2}".format(s_idx, e_idx, text[s_idx:e_idx]))
您可以进一步“缩短”并改进此代码,这只是快速而肮脏的编写。