如何从输入文本文件中提取特定行并在python中打印它们?

时间:2017-01-18 14:04:11

标签: python

我有这个包含FeII排放过渡线的文本文件。头部是:n_high,n_low,波长,强度(其中n_high和n_low是上下转换,从

开始)

2 --> 1,,,371 --> 1,3 --> 2,,,371 --> 2,,, (and so on till the last chunk) 371 --> 370

输入文件如下:

#n_hi n_lo WL(A) logI
2   1   259811.86   1.158
3   1   149730.41   -2.054
4   1   115894.98   -2.134
5   1   102320.80   -2.389
6   1   53387.13    0.256
7   1   41138.69    -0.277
8   1   35226.70    -1.585
9   1   32068.36    -1.741
10  1   12566.77    2.323
.
.
.
.
369 1   1069.66 1.461
370 1   1065.75 -7.901
371 1   1065.64 -8.011
3   2   353390.47   0.759
4   2   209224.17   -2.390
5   2   168797.89   -2.607
.
.
.
370 369 291200.84   -10.337
371 369 283465.88   -10.436
371 370 10672868.00 -12.012

共有68635行。

这里的任务是我只想选择波长范围内的特定转换,比如[x1,x2],并将整行打印到另一个文件中。

所以,我能做的就是准备一个算法来做到这一点:

for n_low from 1 to 370:
  for n_hi from n_low+1 to 371:
    if x2 <= wavelength <= x1:
      print this row to file
    else:
      exit

我想用python执行它。

3 个答案:

答案 0 :(得分:3)

您可以使用功能强大的pandas

我使用io.StringIO来模拟data的文件,但您必须使用filename代替f

data = '''2   1   259811.86   1.158
3   1   149730.41   -2.054
4   1   115894.98   -2.134
5   1   102320.80   -2.389
6   1   53387.13    0.256
7   1   41138.69    -0.277
8   1   35226.70    -1.585
9   1   32068.36    -1.741
10  1   12566.77    2.323
369 1   1069.66 1.461
370 1   1065.75 -7.901
371 1   1065.64 -8.011
3   2   353390.47   0.759
4   2   209224.17   -2.390
5   2   168797.89   -2.607
370 369 291200.84   -10.337
371 369 283465.88   -10.436
371 370 10672868.00 -12.012'''

import pandas as pd

# simulate file
import io 
f = io.StringIO(data)

# use filename instead of `f` 
# it reads data from file using spaces as separators 
# and add headers 'n_hi','n_lo', 'WL(A)', 'logI'
df = pd.read_csv(f, names=['n_hi','n_lo', 'WL(A)', 'logI'], sep='\s+')

#print(df)

# get rows which have 1000 < WL < 25000
selected = df[ df['WL(A)'].between(1000, 25000) ] 
print(selected)

selected.to_csv('result.csv', sep=' ', header=False)

答案 1 :(得分:3)

如果你想使用标准的python,下面的函数应该可以工作(假设数据是制表符分隔的):

def filter_wavelength(x1, x2, input_path, output_path):
    with open(output_path, 'w') as output_file:
        with open(input_path) as input_file:
            for line in input_file:
                try:
                    tokens = line.split('\t')
                    wave_length = float(tokens[2])
                    if x1 <= wave_length <= x2:
                        output_file.write(line)
                except Exception, e:
                    print(str(e))

这样称呼:

filter_wavelength(1,2,'path/to/input', 'path/to/output')

答案 2 :(得分:1)

如果你唯一担心的是WL(A),你不需要关心n_hi和n_lo,试试这个:

def extract_wave_lengths(x1, x2, input_file, output_file):
    with open(input_file, 'r') as ifile, open(output_file, 'w') as ofile:
        next(ifile)  # Skip header
        for line in ifile:
            parts = line.split()
            wave_length = float(parts[2])
            if x2 <= wave_length <= x1:
                ofile.write(line)

然后你可以这样称呼它:

extract_wave_lengths(100000, 5000, "/path/to/input/file", "/path/to/output/file")