将Python列表从具有多值字段的CSV转换为Python嵌套列表,对嵌套列表值进行排序并导出为CSV

时间:2015-03-05 17:04:31

标签: python list csv nested

我使用Python csv模块将带有多值字段的csv转换为Python list。输出包含具有多个相关值的字段。

['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive']
['ABC', 'ZYG098, AB0134, GF0158', 'A1, B2, C3', 'John Doe, Jane Doe, Abraham Lincoln', '20150301', 'A', 'B']
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C']
['ABC', 'XAZ012, AB0134, YZ089', 'C1, B2, A2 ', 'John Adams, Jane Doe, Thomas Jefferson', '20150302', 'A', 'B']

我想将车辆,车辆类和驾驶员ID字段转换为嵌套列表,这样如果我对车辆row[1]中的每个子列表进行排序,以确保车辆始终按字母顺序显示在子列表中,车辆类和司机保持在相应的,正确的订单。所以标题和第一行子列表的排列方式如下:

['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive']
['ABC', 'AB0134, GF0158, ZYG098', 'B2, C3, A1', 'Jane Doe, Abraham Lincoln, John Doe', '20150301', 'A', 'B']
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C']
['ABC', 'AB0134, YZ089, XAZ012', 'B2, A2, C1', 'Jane Doe, Thomas Jefferson, John Adams', '20150302', 'A', 'B']

因此,在上面的输出中,车辆的每个子组/列表都按字母顺序排序,车辆类和Driver_ID会根据需要重新安排,以保持与各自车辆的原始关系(即驾驶员ID - John Doe驾驶车辆 - ZYG098是车辆类 - A1,所以那些物品在他们的子列表中移动以反映ZYG098现在是最后的,而不是第一个)。如果可以这样做,您将如何将生成的嵌套列表导出回原始标题的CSV?

道歉,如果这很简单或荒谬,我只是开始学习Python。如果嵌套列表不是最佳选项,我可以使用任何其他解决方案(对于字典,我需要连接字段来创建密钥,因为没有组合Route_Date的唯一密钥)。如果有人拥有使用Python处理各种CSV用例的可靠资源,那么推荐会很棒。

提前感谢您的耐心和帮助。

2 个答案:

答案 0 :(得分:1)

最后在同一页上,它需要一些工作,但这将做你想要的:

from itertools import chain
import csv


l = [['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive'],
     ['ABC', 'ZYG098, AB0134, GF0158', 'A1, B2, C3', 'John Doe, Jane Doe, Abraham Lincoln', '20150301', 'A', 'B'],
     ['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C'],
     ['ABC', 'XAZ012, AB0134, YZ089', 'C1, B2, A2 ', 'John Adams, Jane Doe, Thomas Jefferson', '20150302', 'A', 'B']]
it = map(list,zip(*l))

# transpose original list, row-columns, columns-rows
it =  zip(*l)

# get each column separately, using iter so we can pop first element
# off to get headers efficiently 
route, veh, veh_c, d_id, date, start, arrive = iter(iter(next(it))), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it))

# get all headers to write later
headers = next(route), next(veh), next(veh_c), next(d_id), next(date), next(start), next(arrive)

srt_veh = []
key_inds = []

# sort vehicle elements and keep a record of old indexes
# so subelements in Vehicle_class and driver_id can be rearranged to match
for x in veh:
    srt = sorted(x.split(","))
    key_inds.append([x.split(",").index(w) for w in srt])
    srt_veh.append(",".join(srt).strip())

srt_veh_cls = []

# sort vehicle class based on old index of elements in vehicles
# and rejoin split elements
for ind, ele in enumerate(veh_c):
    spl = ele.split(",")
    srt_veh_cls.append(",".join([spl[i].strip() for i in key_inds[ind]]))

srt_dr_id = []

# sort driver_ids  based on old index of elements in vehicle
# and join subelements again after splitting and sorting
for ind, ele in enumerate(d_id):
    spl = ele.split(",")
    srt_dr_id.append(",".join([spl[i].strip() for i in key_inds[ind]]))

 # transpose again for writing
zipped = zip(*(route, srt_veh, srt_veh_cls,
           srt_dr_id, date, start, arrive))

最后用csv.writerows写道:

with open("out.csv", "w") as f:
    wr = csv.writer(f)
    wr.writerow(headers)
    wr.writerows(zipped)

输出:

Route,Vehicles,Vehicle Class,Driver_ID,Date,Start,Arrive
ABC,"AB0134, GF0158,ZYG098","B2,C3,A1","Jane Doe,Abraham Lincoln,John Doe",20150301,A,B
AC,ZGA123,C3,George Washington,20150301,A,C
ABC,"AB0134, YZ089,XAZ012","B2,A2,C1","Jane Doe,Thomas Jefferson,John Adams",20150302,A,B

对于python 2,用itertools.izip替换zip并使用itertools.imap映射:

from itertools import izip, imap

你可以拉链更多,并做一些事情来缩短代码,但我认为这无助于可读性。

答案 1 :(得分:0)

要转换为您描述的嵌套格式:

nested = zip(*lst)

拉链是它自己的反转:

orig = zip(*nested)

但也许你真正想要的是:

import operator

sort = sorted(lst[1:], key=operator.itemgetter(1))

它为您提供了按行1排序的新列表。在这种情况下,您还没有更改数据的格式,因此您应该能够将其作为csv转储回来而不进行修改,尽管您可以使用#c; d需要在lst [0]之前添加原始标题。