如果第5个元素为“名称”,则按第4个元素拆分列表;如果第5个元素为“地址”,则按第5个元素拆分列表

时间:2019-06-19 20:15:54

标签: python-3.x pandas list dataframe for-loop

我有8000个名称,公司,地址1,地址2,地址3(可选)的列表,顺序如下所示。这是一个python列表

[John
It Tech
1243 mary drive
florida-32006
mark
Infotech
1245 lonely drive
irony estate 324
Virginia-24531
Steve
king Electronics
.
.
]

address3是可选的,每次都不能期望它在顺序中。name将在address2的下一个开始。我需要按第n个元素对列表进行拆分,即通过名称检查第5个元素是名称还是地址。如果是地址,则将分割数增加nth + 1。如果不是,请继续4。我只想将它们添加到excel列中。

下面是我尝试的代码,它按第4个元素划分列表。它忽略第五个元素是名称还是地址。

test2= [final1[x:x+4] for x in range(0, len(final1),4)]

这给我的结果是

[['John','It Tech','1243 mary drive','florida-32006'], 
['mark','Infotech','1245 lonely drive','irony estate 324'],
['Virginia-24531','Steve','king Electronics']] 
** the last list should start from Steve and the address(Virginia-24531) in the previous list***

我恳请某人帮助解决此问题。提前非常感谢

2 个答案:

答案 0 :(得分:1)

您的地址似乎包含数字-您的名称*没有。您可以使用它来进行相应的拆分:

from string import digits

def splitIt(data):
    i = 0
    tmp = []
    for d in data:
        # the first 4 lines we add without check - datasets have always 4 lines minimum
        if i > 3:            # this will break if you have data with fewer then 4 lines
            if not any(c in digits for c in d):  # if the 5th line has no numbers: NAME
                yield tmp
                tmp = [d]
                i = 1
        else:                                    # else another address line, check 6th
            tmp.append(d)                        # line for name (you can have as many 
            i += 1                               # lines with numbers in it after the 4th)
    yield tmp


data = ["John",       # no number
"It Tech",
"1243 mary drive",    # has number
"florida-32006",      # has number
"mark",               # no number
"Infotech",
"1245 lonely drive",
"irony estate 324",
"Virginia-24531",
"Steve",
"king Electronics",]

print(list(splitIt(data)))

输出:

[['John', 'It Tech', '1243 mary drive', 'florida-32006'], 
 ['mark', 'Infotech', '1245 lonely drive', 'irony estate 324'], 
 ['Steve', 'king Electronics']]

免责声明:Falsehoods Programmers Believe About Names

答案 1 :(得分:0)

以下假设是地址带有连字符,而名称没有(非常危险)。然后,它将根据最后一个条目[-1]是地址还是名称来剪切列表的前4或5个元素。

result=[]
for x in range(len(final1)):
    tmp = final1[:5] if len(final1) >= 5 else final1
    if len(tmp)==0:
        break

    if '-' in tmp[-1]:
        result.append(tmp) 
        final1 = final1[5:]
    else:
        result.append(final1[:4])
        final1 = final1[4:]
result

[['John', 'It Tech', '1243 mary drive', 'florida-32006'],
 ['mark','Infotech','1245 lonely drive','irony estate 324','Virginia-24531'],
 ['Steve', 'king Electronics']]