我的文本文件中的数据如下所示:
2,20 12,40 13,100 14,300
15,440 16,10 24,50 25,350
26,2322 27,3323 28,9999 29,2152
30,2622 31,50
我想在Python中将这些数据读入两个不同的列表。但是,这不是CSV文件。数据读取如下:
mass1,intensity1 mass2,intensity2 mass3,intensity3...
我应该如何将群众和强度读入两个不同的名单?我试图避免编写此文件以使数据更整洁和/或以CSV格式。
答案 0 :(得分:5)
看起来您可以line.split()
每行分隔各个对,然后使用pair.split(",")
分隔每对中的质量和强度。
答案 1 :(得分:1)
mass_results = []
intensity_results = []
with open('in.txt', 'r') as f:
for line in f:
for readings in line.split(' '):
mass, intensity = readings.split(',')
mass_results.append(int(mass.strip()))
intensity_results.append(int(intensity.strip()))
print('Mass values:')
print(mass_results)
print('Intensity values:')
print(intensity_results)
收率:
Mass values:
[2, 12, 13, 14, 15, 16, 24, 25, 26, 27, 28, 29, 30, 31]
Intensity values:
[20, 40, 100, 300, 440, 10, 50, 350, 2322, 3323, 9999, 2152, 2622, 50]
答案 2 :(得分:0)
import re
# read the file
f = open('input.dat','r')
data = f.read()
f.close()
# grab mass and intensity values using regex
m_re = '[0-9]+(?=,[0-9]+)'
i_re = '(?<=[0-9],)[0-9]+'
mass = re.findall(m_re,data)
intensity = re.findall(i_re,data)
# view results
print "Mass values:", mass
print "Intensity values:", intensity
print "(Mass,Intensity):", zip(mass,intensity)
如果您提到的25行标题与正则表达式匹配并改变结果,您可以尝试用以下内容替换上面的文件输入部分:
# read the file
f = open('input.dat','r')
lines = f.readlines()[25:] # ignore first 25 lines
f.close()
data = ' '.join(lines)
答案 3 :(得分:0)
假设输入文件类似于
#this is header
#this is header
#this is header
2,20 12,40 13,100 14,300
15,440 16,10 24,50 25,350
26,2322 27,3323 28,9999 29,2152
30,2622 31,50
您可以使用re
如果文件非常大
import re
def xy_parser( fname, header_len=3):
with open( fname) as f:
for i,line in enumerate(f):
if i < header_len:
continue
else:
yield re.findall( '[0-9]+,[0-9]+', line)
def xy_maker( xy_str):
return map( float, xy_str.split(',') )
my_xys = []
for xys in xy_parse( 'xydata.txt'):
my_xys += [ xy_maker(val) for val in xys ]
my_xys
#[[2.0, 20.0],
# [12.0, 40.0],
# [13.0, 100.0],
# [14.0, 300.0],
# [15.0, 440.0],
# [16.0, 10.0],
# [24.0, 50.0],
# [25.0, 350.0],
# [26.0, 2322.0],
# [27.0, 3323.0],
# [28.0, 9999.0],
# [29.0, 2152.0],
# [30.0, 2622.0],
# [31.0, 50.0]]
<方法2
我还想指出,如果文件不是太大,那么一次性阅读
f = open('xydata.txt', 'r')
header_len = 3
for i in xrange(header_len): # skip the header lines
f.readline()
data_str = f.read().replace('\n','') # read from current file pos to end of file and replace new line chars
data_xy_str = re.findall( '[0-9]+,[0-9]+', data_str)
my_xys = [ xy_maker(xy_str) for xy_str in data_xy_str ]
# yields the same result as above