Question

我有8000个名称，公司，地址1，地址2，地址3（可选）的列表，顺序如下所示。这是一个python列表

[John
It Tech
1243 mary drive
florida-32006
mark
Infotech
1245 lonely drive
irony estate 324
Virginia-24531
Steve
king Electronics
.
.
]

address3是可选的，每次都不能期望它在顺序中。name将在address2的下一个开始。我需要按第n个元素对列表进行拆分，即通过名称检查第5个元素是名称还是地址。如果是地址，则将分割数增加nth + 1。如果不是，请继续4。我只想将它们添加到excel列中。

下面是我尝试的代码，它按第4个元素划分列表。它忽略第五个元素是名称还是地址。

test2= [final1[x:x+4] for x in range(0, len(final1),4)]

这给我的结果是

[['John','It Tech','1243 mary drive','florida-32006'], 
['mark','Infotech','1245 lonely drive','irony estate 324'],
['Virginia-24531','Steve','king Electronics']] 
** the last list should start from Steve and the address(Virginia-24531) in the previous list***

我恳请某人帮助解决此问题。提前非常感谢

Answer 1

您的地址似乎包含数字-您的名称*没有。您可以使用它来进行相应的拆分：

from string import digits

def splitIt(data):
    i = 0
    tmp = []
    for d in data:
        # the first 4 lines we add without check - datasets have always 4 lines minimum
        if i > 3:            # this will break if you have data with fewer then 4 lines
            if not any(c in digits for c in d):  # if the 5th line has no numbers: NAME
                yield tmp
                tmp = [d]
                i = 1
        else:                                    # else another address line, check 6th
            tmp.append(d)                        # line for name (you can have as many 
            i += 1                               # lines with numbers in it after the 4th)
    yield tmp


data = ["John",       # no number
"It Tech",
"1243 mary drive",    # has number
"florida-32006",      # has number
"mark",               # no number
"Infotech",
"1245 lonely drive",
"irony estate 324",
"Virginia-24531",
"Steve",
"king Electronics",]

print(list(splitIt(data)))

输出：

[['John', 'It Tech', '1243 mary drive', 'florida-32006'], 
 ['mark', 'Infotech', '1245 lonely drive', 'irony estate 324'], 
 ['Steve', 'king Electronics']]

免责声明：Falsehoods Programmers Believe About Names

Answer 2

以下假设是地址带有连字符，而名称没有（非常危险）。然后，它将根据最后一个条目[-1]是地址还是名称来剪切列表的前4或5个元素。

result=[]
for x in range(len(final1)):
    tmp = final1[:5] if len(final1) >= 5 else final1
    if len(tmp)==0:
        break

    if '-' in tmp[-1]:
        result.append(tmp) 
        final1 = final1[5:]
    else:
        result.append(final1[:4])
        final1 = final1[4:]
result

[['John', 'It Tech', '1243 mary drive', 'florida-32006'],
 ['mark','Infotech','1245 lonely drive','irony estate 324','Virginia-24531'],
 ['Steve', 'king Electronics']]

如果第5个元素为“名称”，则按第4个元素拆分列表；如果第5个元素为“地址”，则按第5个元素拆分列表

2 个答案: