从列表中提取不同布局的项目

时间:2019-01-17 13:09:50

标签: python list extract items

我有一个来自Linux程序的异常文件;示例的第一行是:

 1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000
 2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000
 3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000
 4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000
 5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000
 6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000
 7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000
 8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000
 9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000
10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000

我只需要从每一行中提取两个值:

191340
725670
1.4378e+06
2.178e+06
.... etc

1.00000
2.00000
3.00000
4.00000
.... etc

此代码:

import csv
with open('NGC1365GaiaPhotomLogTestTenLines.dat', "rb") as infile:
read = csv.reader(infile)
    for row in read :
        print (row)

生成:

['         1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000']
['         2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000']
['         3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000']
['         4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000']
['         5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000']
['         6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000']
['         7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000']
['         8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000']
['         9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000']
['        10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000']

问题在于生成的列表不是用逗号分隔的好项目-输入文件中的项目用空格分隔,并且空格数可以变化,因为第一列中值的格式也可以变化。 / p>

尽管我不会感到困难,但是我咨询了很多线程,却一无所获。

3 个答案:

答案 0 :(得分:3)

与这里的其他答案相反,我认为您应该使用csv模块。如果文件中包含标题或带引号的字段,则比在事实发生后尝试修改自定义解决方案要快乐得多:

with open('filename') as infile:
    r = csv.reader(infile, delimiter=' ', skipinitialspace=True)
    for row in r:
        print(row)

您的文件似乎在计算机上用制表符分隔。在这种情况下,您可以在上面将delimiter=' '更改为delimiter='\t'

您还可以使用,它具有更通用的空白模式

df = pd.read_csv("filename", header=None, delim_whitespace=True)

答案 1 :(得分:2)

@Eugen Constantin Dinca和@tobias_k简化代码

with open('csv.dat', "rb") as infile:
  for row in infile:
    print row.split()

输出:

['1', '1011.720000', '1830.340000', '0', '0', '0', '191340', '?', '1.000000']
['2', '1011.720000', '1830.340000', '0', '0', '0', '725670', '?', '2.000000']
['3', '1011.720000', '1830.340000', '0', '0', '0', '1.4378e+06', '?', '3.000000']
['4', '1011.720000', '1830.340000', '0', '0', '0', '2.178e+06', '?', '4.000000']
['5', '1011.720000', '1830.340000', '0', '0', '0', '2.8806e+06', '?', '5.000000']
['6', '1011.720000', '1830.340000', '0', '0', '0', '3.5353e+06', '?', '6.000000']
['7', '1011.720000', '1830.340000', '0', '0', '0', '4.1598e+06', '?', '7.000000']
['8', '1011.720000', '1830.340000', '0', '0', '0', '4.7729e+06', '?', '8.000000']
['9', '1011.720000', '1830.340000', '0', '0', '0', '5.3924e+06', '?', '9.000000']
['10', '1011.720000', '1830.340000', '0', '0', '0', '6.0281e+06', '?', '10.000000']

答案 2 :(得分:0)

这是您可以使用的代码

关于您的代码csv.reader的几点要点也不过分。一切都使用简单的内置程序完成-无需外部依赖。

也不要使用read这样的变量名。

lines = """1 1011.720000 1830.340000            0            0            0           191340          ?   1.000000
 2 1011.720000 1830.340000            0            0            0           725670          ?   2.000000
 3 1011.720000 1830.340000            0            0            0       1.4378e+06          ?   3.000000
 4 1011.720000 1830.340000            0            0            0        2.178e+06          ?   4.000000
 5 1011.720000 1830.340000            0            0            0       2.8806e+06          ?   5.000000
 6 1011.720000 1830.340000            0            0            0       3.5353e+06          ?   6.000000
 7 1011.720000 1830.340000            0            0            0       4.1598e+06          ?   7.000000
 8 1011.720000 1830.340000            0            0            0       4.7729e+06          ?   8.000000
 9 1011.720000 1830.340000            0            0            0       5.3924e+06          ?   9.000000
10 1011.720000 1830.340000            0            0            0       6.0281e+06          ?  10.000000"""

for line in lines.split("\n"):
    toks = line.split() # This should split the line into tokens separated by one or more white space characters. 

    if len(toks) == 9: # Just to make sure there are enough tokens. 
        # do whatever you want
        print (toks[6])