说,我有一个这样的值列表,
["Started with no key words",
PCC WITH NOTHING,
ABB,CAI null V00011 11/06/18,
ANDERS,SAND null V000103 07/10/17,
"",
PSP SECONDARY,
MUNCH,TORY null V000113 04/08/19 ,
"There is no key words here",
PCC WITH SOEMTHING,
BEC,RUMA null V00011 04/17/19 ,
"There is no keyword here too",
ASP HAS IT,
XON,ANDREA null V00011 03/27/19]
我有一个这样的关键字列表:
key_word_list = ['PCC', 'PSP', 'ASP']
现在,当我浏览key_word_list
中的每个关键字时,如果找到关键字,则将这些值记录添加到找到关键字的那一行之后,直到下一个关键字。这样的输出,
["Started with no key words",
PCC WITH NOTHING,
PCC ABB,CAI null V00011 11/06/18,
PCC ANDERS,SAND null V000103 07/10/17,
"",
PSP SECONDARY,
PSP MUNCH,TORY null V000113 04/08/19 ,
"There is no key words here",
PCC WITH SOEMTHING,
PCC BEC,RUMA null V00011 04/17/19 ,
"There is no keyword here too",
ASP HAS IT,
ASP XON,ANDREA null V00011 03/27/19]
如何在python中执行此操作?可以吗最好的方法是什么? 我从这样的事情开始
for ind, j in enumerate(key_word_list):
# intermediate_index = [] # Was thinking to save index, but no idea what to do with this either to proceed to next line until next key word
for index,i in enumerate(biglist):
stripped_line = i.strip()
if j in stripped_line:
#do something not sure how to check until next keyword
答案 0 :(得分:3)
您可以创建一个生成器函数,该函数将跟踪当前关键字并在通过过程时产生一行:
def append_keys(l, kw):
current_kw = None
for line in l:
# deal with initial lines with no kw
if current_kw is None and not any(line.startswith(k) for k in kw):
yield line
continue
try:
k = next(k for k in kw if line.startswith(k))
current_kw = k
yield line
except StopIteration:
yield current_kw + " " + line
new_list = list(append_keys(biglist, key_word_list))
新列表:
['PCC WITH NOTHING',
'PCC ABB,CAI null V00011 11/06/18',
'PCC ANDERS,SAND null V000103 07/10/17',
'PSP SECONDARY',
'PSP MUNCH,TORY null V000113 04/08/19',
'PCC WITH SOEMTHING',
'PCC BEC,RUMA null V00011 04/17/19',
'ASP HAS IT',
'ASP XON,ANDREA null V00011 03/27/19']
成为生成器意味着您可以一次遍历一个列表,而无需在内存中创建另一个列表-如果列表确实很大,这很好。
答案 1 :(得分:1)
您可以使用itertools.groupby
:
from itertools import groupby
d = ['PCC WITH NOTHING', 'ABB,CAI null V00011 11/06/18', 'ANDERS,SAND null V000103 07/10/17', 'PSP SECONDARY', 'MUNCH,TORY null V000113 04/08/19', 'PCC WITH SOEMTHING', 'BEC,RUMA null V00011 04/17/19', 'ASP HAS IT', 'XON,ANDREA null V00011 03/27/19']
l = ['PCC', 'PSP', 'ASP']
new_d = [(a, list(b)) for a, b in groupby(d, key=lambda x:any(x.startswith(i) for i in l))]
_d = [[b[0], [i for i in l if b[0].startswith(i)][0]] if a else b for a, b in new_d]
final_result = [[_d[i][0], *[f'{_d[i][-1]} {j}' for j in _d[i+1]]] for i in range(0, len(_d), 2)]
输出:
[['PCC WITH NOTHING', 'PCC ABB,CAI null V00011 11/06/18', 'PCC ANDERS,SAND null V000103 07/10/17'],
['PSP SECONDARY', 'PSP MUNCH,TORY null V000113 04/08/19'],
['PCC WITH SOEMTHING', 'PCC BEC,RUMA null V00011 04/17/19'],
['ASP HAS IT', 'ASP XON,ANDREA null V00011 03/27/19']]