Question

我有一个字符串列表（见下文）。我想通过查找两个特定的标记（开始和结束）来获取列表中的元素，然后保存这些标记之间存在的所有字符串。

例如，我在下面的列表中，并且想要获取在出现的字符串'RATED'和'Like'之间的所有字符串。这些子序列也可能多次出现。

['RATED',
 '  Awesome food at a good price .',
 'Delivery was very quick even on New Year\xe2\x80\x99s Eve .',
 'Please try crispy corn and veg noodles From this place .',
 'Taste maintained .',
 'Like',
 '1',
 'Comment',
 '0',
 'Share',
 'Divyansh Agarwal',
 '1 Review',
 'Follow',
 '3 days ago',
 'RATED',
 '  I have tried schezwan noodles and the momos with kitkat shake',
 "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone",
 'Like']

我尝试了其他方法，例如正则表达式，但没有一个解决问题。

Answer 1

您可以使用正则表达式首先，您需要使用一些不会出现在文本中的定界符来加入列表

delimiter = "#$#"
bigString = delimiter + delimiter.join(yourList) + delimiter

之后，您可以使用正则表达式

results = re.findall(r'#\$#RATED#\$#(.*?)#\$#Like#\$#', bigString)

现在，您只需要迭代所有结果并使用定界符分割字符串

for s in results:
    print(s.split(delimiter))

Answer 2

我建议您了解有关序列类型的索引查找和切片的信息：

https://docs.python.org/3.7/library/stdtypes.html#common-sequence-operations

示例：

def group_between(lst, start_token, end_token):
    while lst:
        try:
            # find opening token
            start_idx = lst.index(start_token) + 1
            # find closing token
            end_idx = lst.index(end_token, start_idx)
            # output sublist
            yield lst[start_idx:end_idx]
            # continue with the remaining items
            lst = lst[end_idx+1:]
        except ValueError:
            # begin or end not found, just skip the rest
            break

l = ['RATED','  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', 'Like', 
     '1', 'Comment', '0', 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', '3 days ago',
     'RATED', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 'Like'
]

for i in group_between(l, 'RATED', 'Like'):
    print(i)

输出为：

['  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .']
['  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]

Answer 3

您可以尝试例如

rec = False
result = []
for s in lst:
    if s == 'Like':
        rec = False
    if rec:
        result.append(s)
    if s == 'RATED':
        rec = True

结果

#[' Awesome food at a good price .',
# 'Delivery was very quick even on New Year’s Eve .',
# 'Please try crispy corn and veg noodles From this place .',
# 'Taste maintained .',
# ' I have tried schezwan noodles and the momos with kitkat shake',
# "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]

Answer 4

def find_between(old_list, first_word, last_word):
    new_list = []
    flag = False
    for i in old_list:
        if i is last_word:
            break
        if i is first_word:
            flag = True
            continue
        if flag:
            new_list.append(i)
    return new_list

Answer 5

使用正则表达式可以做到这一点。

a= ['RATED','  Awesome food at a good price .', 
 'Delivery was very quick even on New Year’s Eve .', 
 'Please try crispy corn and veg noodles From this place .', 
 'Taste maintained .', 'Like', '1', 'Comment', '0', 
 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', 
 '3 days ago', 'RATED', 
 '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 
 'Like']


import re
string = ' '.join(a)
b = re.compile(r'(?<=RATED).*?(?=Like)').findall(string)
print(b)

输出

['   Awesome food at a good price . Delivery was very quick even on New Year’s Eve . Please try crispy corn and veg noodles From this place . Taste maintained . ',
 "   I have tried schezwan noodles and the momos with kitkat shake And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone "]

Answer 6

不带标志的选项：

new_list = []
group = [] # don’t need if the list starts with 'RATED'

for i in your_list:
    if i == 'RATED':
        group = []
    elif i == 'Like':
        new_list.append(group[:])
    else:
        group.append(i)

Answer 7

您可以使用下面的代码，该代码使用一个简单的`for`循环：

l = ['RATED','  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', 'Like', 
     '1', 'Comment', '0', 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', '3 days ago',
     'RATED', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 'Like'
]

st, ed, aa = None, None, []
for k, v in enumerate(l):
    if v == "RATED":
        st = k
    if v == "Like":
        ed = k
    if st != None and ed!= None:
        aa.extend(l[st+1: ed])
        st = None
        ed = None

print (aa)

# ['  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]

在打开和关闭标记之间将列表拆分为元素的子列表

7 个答案:

您可以使用下面的代码，该代码使用一个简单的`for`循环：

在打开和关闭标记之间将列表拆分为元素的子列表

7 个答案:

您可以使用下面的代码，该代码使用一个简单的for循环：

您可以使用下面的代码，该代码使用一个简单的`for`循环：