我的数据非常混乱,我注意到模式在元素的'\ n'结尾,在此之前需要与单个元素合并。
样本列表:
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
ls
返回/试用:
print([s for s in ls if "\n" in s[-1]])
>>> ['world \n', 'is john \n'] # gave elements that ends with \n
如何获取以'\ n'结尾的元素与元素之前的1合并?寻找这样的输出:
['hello world \n', 'my name is john \n', 'How are you?','I am \n doing well']
答案 0 :(得分:2)
我将其写出来,是为了易于理解,而不是试图使其变得更复杂,如列表理解。
这将适用于任何数量的单词,直到您按下\n
字符并清除其余输入内容为止。
ls_out = [] # your outgoing ls
out = '' # keeps your words to use
for i in range(0, len(ls)):
if '\n' in ls[i]: # check for the ending word, if so, add it to output and reset
out += ls[i]
ls_out.append(out)
out = ''
else: # otherwise add to your current word list
out += ls[i]
if out: # check for remaining words in out if total ls doesn't end with \n
ls_out.append(out)
当字符串连接时,您可能需要添加空格,但是我猜想这只是您的示例。如果您这样做,请进行以下编辑:
out += ' ' + ls[i]
修改:
如果您只想获取一个而不是多个,则可以执行以下操作:
ls_out = []
for i in range(0, len(ls)):
if ls[i].endswith('\n'): # check ending only
if not ls[i-1].endswith('\n'): # check previous string
out = ls[i-1] + ' ' + ls[i] # concatenate together
else:
out = ls[i] # this one does, previous didn't
elif ls[i+1].endswith('\n'): # next one will grab this so skip
continue
else:
out = ls[i] # next one won't so add this one in
ls_out.append(out)
答案 1 :(得分:2)
如果您要缩小列表,也许一种可读方法是使用reduce功能。
functools.reduce(func,iter,[initial_value])对所有可迭代元素进行累积操作,因此不能应用于无限迭代。
首先,您需要进行一次累加才能累积结果,我使用具有两个元素的元组:具有串联字符串的缓冲区,直到找到“ \ n” 和结果。请参见初始结构(1)。
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
def combine(x,y):
if y.endswith('\n'):
return ( "", x[1]+[x[0]+" "+y] ) #<-- buffer to list
else:
return ( x[0]+" "+y, x[1] ) #<-- on buffer
t=reduce( combine, ls, ("",[]) ) #<-- see initial struct (1)
t[1]+[t[0]] if t[0] else t[1] #<-- add buffer if not empty
结果:
['hello world \n', 'my name is john \n', 'How are you? ', 'I am \n doing well ']
(1)解释了初始结构:您使用元组存储缓冲区字符串,直到\n
和已煮熟的字符串列表:
("",[])
手段:
("__ buffer string not yet added to list __", [ __result list ___ ] )
答案 2 :(得分:1)
您可以使用“ re”模块,使用正则表达式来解决该问题。
import re
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
new_ls = []
for i in range(len(ls)):
concat_word = '' # reset the concat word to ''
if re.search(r"\n$", str(ls[i])): # matching the \n at the end of the word
try:
concat_word = str(ls[i-1]) + ' ' + str(ls[i]) # appending to the previous word
except:
concat_word = str(ls[i]) # in case if the first word in the list has \n
new_ls.append(concat_word)
elif re.search(r'\n',str(ls[i])): # matching the \n anywhere in the word
concat_word = str(ls[i])
new_ls.extend([str(ls[i-1]), concat_word]) # keeps the word before the "anywhere" match separate
print(new_ls)
这将返回输出
['hello world \n', 'my name is john \n', 'How are you?', 'I am \n doing well']
答案 3 :(得分:0)
假定第一个元素不以\n
结尾,并且所有单词都超过2个字符:
res = []
for el in ls:
if el[-2:] == "\n":
res[-1] = res[-1] + el
else:
res.append(el)
答案 4 :(得分:0)
尝试一下:
lst=[]
for i in range(len(ls)):
if "\n" in ls[i][-1]:
lst.append((ls[i-1] + ' ' + ls[i]))
lst.remove(ls[i-1])
else:
lst.append(ls[i])
lst
结果:
['hello world \n', 'my name is john \n', 'How are you?', 'I am \n doing well']