在python中从不同长度的行创建列表

时间:2016-10-16 05:10:31

标签: python

我正在尝试为我的数据的python中的每一列创建一个列表,如下所示:

399.75833     561.572000000        399.75833     561.572000000  a_Fe I 399.73920 nm
399.78316     523.227000000        399.78316     523.227000000  
399.80799     455.923000000        399.80799     455.923000000  a_Fe I 401.45340 nm
399.83282     389.436000000        399.83282     389.436000000  
399.85765     289.804000000        399.85765     289.804000000  

问题是我的数据的每一行都有不同的长度。无论如何用空格格式化较短行的剩余空格,使它们的长度都相同?

我希望我的数据采用以下形式:

list one= [399.75833, 399.78316, 399.80799, 399.83282, 399.85765]
list two= [561.572000000, 523.227000000, 455.923000000, 389.436000000, 289.804000000]
list three= [a_Fe, " ", a_Fe, " ", " "]

这是我用来将数据导入python的代码:

fh  = open('help.bsp').read()
the_list = []
for line in fh.split('\n'):
    print line.strip()
    splits = line.split()
    if  len(splits) ==1 and splits[0]== line.strip():
        splits = line.strip().split(',')
    if splits:the_list.append(splits)

2 个答案:

答案 0 :(得分:1)

您需要使用izip_longest来制作列列表,因为标准zip只能运行到给定数组列表中的最短长度。

from itertools import izip_longest
with open('workfile', 'r') as f:
    fh = f.readlines()

# Process all the rows line by line
rows = [line.strip().split() for line in fh]
# Use izip_longest to get all columns, with None's filled in blank spots
cols = [col for col in izip_longest(*rows)]
# Then run your type conversions for your final data lists
list_one = [float(i) for i in cols[2]]
list_two = [float(i) for i in cols[3]]
# Since you want " " instead of None for blanks
list_three = [i if i else " " for i in cols[4]]

输出:

>>> print list_one
[399.75833, 399.78316, 399.80799, 399.83282, 399.85765]
>>> print list_two
[561.572, 523.227, 455.923, 389.436, 289.804]
>>> print list_three
['a_Fe', ' ', 'a_Fe', ' ', ' ']

答案 1 :(得分:0)

那么,你的行是用空格分隔的还是用逗号分隔的,如果用逗号分隔,那么这行不包含空格? (请注意,如果len(splits)==1为真,则splits[0]==line.strip()也为真)。这不是您要显示的数据,而不是您所描述的数据。

从您显示的数据中获取所需的列表:

with open('help.bsp') as h:
    the_list = [ line.strip().split() for line in h.readlines() ]
list_one = [ d[0] for d in the_list ]
list_two = [ d[1] for d in the_list ]
list_three = [ d[4] if len(d) > 4 else ' ' for d in the_list ]

如果你正在阅读逗号分隔(或类似分隔)的文件,我总是建议使用csv模块 - 它会处理许多你可能没有考虑过的边缘情况。