将两个空列表之间的所有列表(字符串列表)合并为Python中的一个列表

时间:2019-11-15 11:41:03

标签: python regex list concatenation grouping

我想将两个空列表之间的所有列表转换为一个列表。例子

    []
    ['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured']
    ['polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of']
    ['tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured']
    ['polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.']
    []
    ['PVC/PVDC', 'blister', 'pack']
    []
    ['Blisters', 'are', 'made', 'in', 'a', 'thermo-forming', 'process', 'from', 'a', 'PVC/PVDC', 'base', 'web.', 'Each', 'tablet']
    ['is', 'filled', 'into', 'a', 'separate', 'blister', 'and', 'a', 'lidding', 'foil', 'of', 'aluminium', 'is', 'welded', 'on.', 'The', 'blisters']
    ['are', 'opened', 'by', 'pressing', 'the', 'tablets', 'through', 'the', 'lidding', 'foil.', 'PVDC', 'foil', 'is', 'in', 'contact', 'with']
    ['the', 'tablets.']
    []
    ['Aluminium', 'blister', 'pack']
    []

从这开始,我想要的第一个列表是:

['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured', 'polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of', 'tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured','polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.'] 

下一个列表变为:

['PVC/PVDC', 'blister', 'pack']

,该模式应继续。 到目前为止的代码:

import csv, re
filepath = r'C:\Users\techj\Music\Data\Tagged\090388 (1.0,CURRENT,LATEST APPROVED.txt)'

with open(filepath) as f:
        content = f.readlines()
#        s = ' '.join(x for x in content if x)
#        print(s)

        for line in content:
            line = line.split()
            print(line)

3 个答案:

答案 0 :(得分:0)

这可能不是您想要的,但是我认为您正在尝试从文件中读取段落。该代码将为您提供以下段落:

with open(path) as f:
    data=f.read()
paragraphs=data.split("\n\n")

现在,如果您希望每个段落中的单词都可以按空格分开:

all_words=[]
for paragraph in paragraphs:
    words=paragraph.split(" ")
    all_words.append(words)
print(all_words)

答案 1 :(得分:0)

尝试一下

filepath = r'C:\Users\techj\Music\Data\Tagged\090388 (1.0,CURRENT,LATEST APPROVED.txt)'

with open(filepath, 'r') as file:
    _temp = []
    for line in file:
        _line = line.split()
        if _line:
            _temp+=_line
        else:
            print(_temp)
            _temp = []

对于python 3.8,

with open(filepath, 'r') as file:
    _temp = []
    for line in file: 
        if (_line:=line.split()):
            _temp+=_line
        else:
            print(_temp)
            _temp = []

答案 2 :(得分:0)

由于我无权访问您的文件,但我想测试算法,因此我创建了两个生成器函数,这些函数将输入行生成为字符串列表。第一个生成器函数基于您的代码读取文件并将每一行拆分为字符串列表。第二个,我用于测试,使用预先分割的字符串列表。您只需要将对line_producer_2的调用替换为对line_producer_1的调用,即可从文件中获取输入。

def line_producer_1():
    import csv, re
    filepath = r'C:\Users\techj\Music\Data\Tagged\090388 (1.0,CURRENT,LATEST APPROVED.txt)'

    with open(filepath) as f:
            content = f.readlines()
    #        s = ' '.join(x for x in content if x)
    #        print(s)

            for line in content:
                line = line.split()
                yield line

def line_producer_2():
    lines = [
        [],
        ['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured'],
        ['polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of'],
        ['tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured'],
        ['polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.'],
        [],
        ['PVC/PVDC', 'blister', 'pack'],
        [],
        ['Blisters', 'are', 'made', 'in', 'a', 'thermo-forming', 'process', 'from', 'a', 'PVC/PVDC', 'base', 'web.', 'Each', 'tablet'],
        ['is', 'filled', 'into', 'a', 'separate', 'blister', 'and', 'a', 'lidding', 'foil', 'of', 'aluminium', 'is', 'welded', 'on.', 'The', 'blisters'],
        ['are', 'opened', 'by', 'pressing', 'the', 'tablets', 'through', 'the', 'lidding', 'foil.', 'PVDC', 'foil', 'is', 'in', 'contact', 'with'],
        ['the', 'tablets.'],
        [],
        ['Aluminium', 'blister', 'pack'],
        [],
    ]
    for line in lines:
        yield line

accumulated_lines = []
for line in line_producer_2():
    if line:
        accumulated_lines.extend(line)
    elif accumulated_lines:
        print(accumulated_lines)
        accumulated_lines = []
if accumulated_lines:
    print(accumulated_lines)

打印:

['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured', 'polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of', 'tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured', 'polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.']
['PVC/PVDC', 'blister', 'pack']
['Blisters', 'are', 'made', 'in', 'a', 'thermo-forming', 'process', 'from', 'a', 'PVC/PVDC', 'base', 'web.', 'Each', 'tablet', 'is', 'filled', 'into', 'a', 'separate', 'blister', 'and', 'a', 'lidding', 'foil', 'of', 'aluminium', 'is', 'welded', 'on.', 'The', 'blisters', 'are', 'opened', 'by', 'pressing', 'the', 'tablets', 'through', 'the', 'lidding', 'foil.', 'PVDC', 'foil', 'is', 'in', 'contact', 'with', 'the', 'tablets.']
['Aluminium', 'blister', 'pack']

See Demo