Question

我正在尝试为我的数据的python中的每一列创建一个列表，如下所示：

399.75833     561.572000000        399.75833     561.572000000  a_Fe I 399.73920 nm
399.78316     523.227000000        399.78316     523.227000000  
399.80799     455.923000000        399.80799     455.923000000  a_Fe I 401.45340 nm
399.83282     389.436000000        399.83282     389.436000000  
399.85765     289.804000000        399.85765     289.804000000

问题是我的数据的每一行都有不同的长度。无论如何用空格格式化较短行的剩余空格，使它们的长度都相同？

我希望我的数据采用以下形式：

list one= [399.75833, 399.78316, 399.80799, 399.83282, 399.85765]
list two= [561.572000000, 523.227000000, 455.923000000, 389.436000000, 289.804000000]
list three= [a_Fe, " ", a_Fe, " ", " "]

这是我用来将数据导入python的代码：

fh  = open('help.bsp').read()
the_list = []
for line in fh.split('\n'):
    print line.strip()
    splits = line.split()
    if  len(splits) ==1 and splits[0]== line.strip():
        splits = line.strip().split(',')
    if splits:the_list.append(splits)

Answer 1

您需要使用izip_longest来制作列列表，因为标准zip只能运行到给定数组列表中的最短长度。

from itertools import izip_longest
with open('workfile', 'r') as f:
    fh = f.readlines()

# Process all the rows line by line
rows = [line.strip().split() for line in fh]
# Use izip_longest to get all columns, with None's filled in blank spots
cols = [col for col in izip_longest(*rows)]
# Then run your type conversions for your final data lists
list_one = [float(i) for i in cols[2]]
list_two = [float(i) for i in cols[3]]
# Since you want " " instead of None for blanks
list_three = [i if i else " " for i in cols[4]]

输出：

>>> print list_one
[399.75833, 399.78316, 399.80799, 399.83282, 399.85765]
>>> print list_two
[561.572, 523.227, 455.923, 389.436, 289.804]
>>> print list_three
['a_Fe', ' ', 'a_Fe', ' ', ' ']

Answer 2

那么，你的行是用空格分隔的还是用逗号分隔的，如果用逗号分隔，那么这行不包含空格？（请注意，如果len(splits)==1为真，则splits[0]==line.strip()也为真）。这不是您要显示的数据，而不是您所描述的数据。

从您显示的数据中获取所需的列表：

with open('help.bsp') as h:
    the_list = [ line.strip().split() for line in h.readlines() ]
list_one = [ d[0] for d in the_list ]
list_two = [ d[1] for d in the_list ]
list_three = [ d[4] if len(d) > 4 else ' ' for d in the_list ]

如果你正在阅读逗号分隔（或类似分隔）的文件，我总是建议使用csv模块 - 它会处理许多你可能没有考虑过的边缘情况。

在python中从不同长度的行创建列表

2 个答案: