将关键字值添加到下一行,直到找到下一个关键字-Python

时间:2019-07-18 19:58:31

标签: python python-3.x

说,我有一个这样的值列表,

["Started with no key words",
PCC WITH NOTHING,
ABB,CAI null V00011 11/06/18,
ANDERS,SAND null V000103 07/10/17,
"",
PSP SECONDARY,
MUNCH,TORY null V000113 04/08/19 ,
"There is no key words here",
PCC WITH SOEMTHING,
BEC,RUMA null V00011 04/17/19 ,
"There is no keyword here too",
ASP HAS IT,
XON,ANDREA null V00011 03/27/19]

我有一个这样的关键字列表:

key_word_list = ['PCC', 'PSP', 'ASP']

现在,当我浏览key_word_list中的每个关键字时,如果找到关键字,则将这些值记录添加到找到关键字的那一行之后,直到下一个关键字。这样的输出,

["Started with no key words",
PCC WITH NOTHING,
PCC ABB,CAI null V00011 11/06/18,
PCC ANDERS,SAND null V000103 07/10/17,
"",
PSP SECONDARY,
PSP MUNCH,TORY null V000113 04/08/19 ,
"There is no key words here",
PCC WITH SOEMTHING,
PCC BEC,RUMA null V00011 04/17/19 ,
"There is no keyword here too",
ASP HAS IT,
ASP XON,ANDREA null V00011 03/27/19]

如何在python中执行此操作?可以吗最好的方法是什么? 我从这样的事情开始

for ind, j in enumerate(key_word_list):
    # intermediate_index = []  # Was thinking to save index, but no idea what to do with this either to proceed to next line until next key word
    for index,i in enumerate(biglist):
        stripped_line = i.strip()
        if j in stripped_line:
            #do something not sure how to check until next keyword

2 个答案:

答案 0 :(得分:3)

您可以创建一个生成器函数,该函数将跟踪当前关键字并在通过过程时产生一行:

def append_keys(l, kw):
    current_kw = None

    for line in l:
         # deal with initial lines with no kw
        if current_kw is None and not any(line.startswith(k) for k in kw):
            yield line
            continue
        try:
            k = next(k for k in kw if line.startswith(k))
            current_kw = k
            yield line
        except StopIteration:
            yield current_kw + " " + line

new_list = list(append_keys(biglist, key_word_list))

新列表:

['PCC WITH NOTHING',
 'PCC ABB,CAI null V00011 11/06/18',
 'PCC ANDERS,SAND null V000103 07/10/17',
 'PSP SECONDARY',
 'PSP MUNCH,TORY null V000113 04/08/19',
 'PCC WITH SOEMTHING',
 'PCC BEC,RUMA null V00011 04/17/19',
 'ASP HAS IT',
 'ASP XON,ANDREA null V00011 03/27/19']

成为生成器意味着您可以一次遍历一个列表,而无需在内存中创建另一个列表-如果列表确实很大,这很好。

答案 1 :(得分:1)

您可以使用itertools.groupby

from itertools import groupby
d = ['PCC WITH NOTHING', 'ABB,CAI null V00011 11/06/18', 'ANDERS,SAND null V000103 07/10/17', 'PSP SECONDARY', 'MUNCH,TORY null V000113 04/08/19', 'PCC WITH SOEMTHING', 'BEC,RUMA null V00011 04/17/19', 'ASP HAS IT', 'XON,ANDREA null V00011 03/27/19']
l = ['PCC', 'PSP', 'ASP']

new_d = [(a, list(b)) for a, b in groupby(d, key=lambda x:any(x.startswith(i) for i in l))]
_d = [[b[0], [i for i in l if b[0].startswith(i)][0]] if a else b for a, b in new_d]
final_result = [[_d[i][0], *[f'{_d[i][-1]} {j}' for j in _d[i+1]]] for i in range(0, len(_d), 2)]

输出:

[['PCC WITH NOTHING', 'PCC ABB,CAI null V00011 11/06/18', 'PCC ANDERS,SAND null V000103 07/10/17'], 
 ['PSP SECONDARY', 'PSP MUNCH,TORY null V000113 04/08/19'], 
 ['PCC WITH SOEMTHING', 'PCC BEC,RUMA null V00011 04/17/19'], 
 ['ASP HAS IT', 'ASP XON,ANDREA null V00011 03/27/19']]