我有一个清单' r'像这样:
[["", 1], ["this is a text line", 2], ["this is a text line", 3], ["this is a text line", 4], ["", 5], ["", 6], ["this is a text line", 7],["this is a text line", 8], ["this is a text line", 9], ["this is a text line", 10], ["", 11], ["this is a text line", 12], ["this is a text line", 13], ["this is a text line", 14], ["", 15], ["this is a text line", 16], ["this is a text line", 17], ["this is a text line", 18], ["", 19]]
要知道我的空行和带文字的行我在哪里过滤我的列表:
empty = [x[1] for x in r if regex.search("^\s*$", x[0])]
text = [x[1] for x in r if regex.search("\S", x[0])]
输出:
empty = [1, 5, 6, 11, 15, 19]
text= [2, 3, 4, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18]
我想要做的是在文本中按顺序组合数字(text [i] -text [i + 1])= +1(为了定义段落):
finaltext = [[2, 3, 4], [7, 8, 9, 10], [12, 13, 14], [16, 17, 18]]
finaltext including empty = [[2, 3, 4, 5, 6], [7, 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]
如何根据条件对列表中的元素进行分组?
答案 0 :(得分:3)
from itertools import groupby, zip_longest
grp_list = [list(g) for k,g in groupby(r, lambda x:x[0]=='')]
grp_list = grp_list[1:] if r[0][0] == '' else grp_list
text = [[j[1] for j in i] for i in grp_list]
finaltext = text[::2]
print (finaltext)
#[[2, 3, 4], [7, 8, 9, 10], [12, 13, 14], [16, 17, 18]]
finaltext_including_empty = [i+j for i,j in zip_longest(text[::2], text[1::2], fillvalue=[])]
print (finaltext_including_empty)
#[[2, 3, 4, 5, 6], [7, 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]
groupby
根据条件将列表分组为子列表块,此处为lambda x:x[0]==''
,这意味着创建一个列表块直到您看到一个空字符串,并遵循此规则直到结束为止以下
[[['', 1]], [['this is a text line', 2], ['this is a text line', 3], ['this is a text line', 4]], [['', 5], ['', 6]],........]
答案 1 :(得分:1)
pip install more_itertools
from more_itertools import chunked
empty = [1, 5, 6, 11, 15, 19]
text= [2, 3, 4, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18]
finaltext_ = sorted(empty + text)
list(chunked(finaltext_,4))
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19]]
答案 2 :(得分:1)
没有任何modules
的纯Python解决方案:
这可以使用modules
来完成,例如使用numpy
和groupby
,但我认为没有它们就可以调用,只需使用普通Python
。这是我的解决方案:
text = [2, 3, 4, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18]
s = 0
finaltext = []
for i in range(len(text)-1):
if text[i] + 1 != text[i+1]:
finaltext.append(text[s:i+1])
s = i+1
finaltext.append(text[s:])
将finaltext
作为:
[[2, 3, 4], [7, 8, 9, 10], [12, 13, 14], [16, 17, 18]]
<强>更新强>
要同时获取lists
(不确定为什么会这样),您可以使用以下内容:
empty = [1, 5, 6, 11, 15, 19]
text = [2, 3, 4, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18]
s = 0
finaltext = []
finaltext_including_empty = []
for i in range(len(text)-1):
if text[i] + 1 != text[i+1]:
finaltext.append(text[s:i+1])
finaltext_including_empty.append(list(range(text[s], text[i+1])))
s = i+1
finaltext.append(text[s:])
finaltext_including_empty.append(list(range(text[s],max(empty[-1]+1, text[-1]+1))))
使finaltext
与以前相同,finaltext_including_empty
为:
[[2, 3, 4, 5, 6], [7, 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]