根据python中的特殊字符将动态列表拆分为子列表

时间:2016-12-07 03:51:43

标签: python

我在python中有一个基本问题,我试图找到解决方案很长一段时间,但我无法得到正确的输出。

textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']]

这里我需要根据''特殊字符将上面的列表拆分为子列表。以上列表是样本列表,主列表是动态的,其中列表的长度可能不同。在任何情况下,列表都需要用''字符分隔。

我尝试过的解决方案:

MainText = str(textvalues)
split_index = MainText.index( '',)
l2 = MainText[:split_index]
print(l2)

预期解决方案:

[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company'] ,['2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']]

请帮我解决这个问题。感谢

3 个答案:

答案 0 :(得分:1)

import itertools

textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']]
groups = []
for a,b in itertools.groupby(textvalues[0], lambda x: x is not ''):
    if a:
        groups.append(list(b))
print groups

输出:

[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company'], ['2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']]

答案 1 :(得分:0)

基本上,您可以遍历内容,将子字符串存储在缓冲区中,并在遇到''分隔符时将缓冲区转储到主列表:

result = list()
line = list()
for element in textvalues[0]:
    if element != '':
        line.append(element)
    else:
        result.append(line)
        line = list()

答案 2 :(得分:0)

textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']]

textvalues2 = []

for i in ','.join(i for i in textvalues[0]).split(',,') :
    textvalues2.append( i.split(',') )