Python - 从数据中保存选定行和列的不同文件

时间:2016-04-10 18:26:25

标签: python split rows

我有一个这种类型的file.dat,但有更多的数据:

Apr  1 18:15 [n1_Cam_A_120213_O.fits]: 
4101.77    1. -3.5612   3.561   -0.278635 4.707   6.448     #data1
0.03223    0.  0.05278  0.05278  0.00237  0.4393  0.4125    #error1
4088.9     1. -0.404974 0.405   -0.06538  5.819   0.        #data2
   0.      0.  0.01559  0.01559  0.00277  0.1717  0.        #error2
4116.4     1. -0.225521 0.2255  -0.041111 5.153   0.        #data3
   0.      0.  0.01947  0.01947  0.00368  0.4748  0.        #error3
4120.8     1. -0.382279 0.3823  -0.062194 5.774   0.        #data4
   0.      0.  0.01873  0.01873  0.00311  0.3565  0.        #error4

Apr  1 18:15 [n1_Cam_B_120213_O.fits]: 
4101.767   0.9999  -4.57791  4.578   -0.388646 0.03091 7.499    #data1
0.0293     0.       0.03447  0.03447  0.00243  0.00873 0.07529  #error1
4088.9     1.      -0.211493 0.2115  -0.080003 2.483   0.
   0.      0.       0.01091  0.01091  0.00327  0.1275  0.
4116.4     1.      -0.237161 0.2372  -0.040493 5.502   0.
   0.      0.       0.02052  0.02052  0.00231  0.5069  0.
4120.8     1.      -0.320798 0.3208  -0.108827 2.769   0.
   0.      0.       0.0167   0.0167   0.00404  0.1165  0.

每个数据集的第一行包含name.fits,偶数行包含值,奇数行(第一行除外)包含之前行中值的错误。然后是一个空行并重新开始。

我需要的是以这种方式将信息分成不同的文件:

name1.fits data1[1] err1[1] data1[2] err1[2] data1[3] err1[3]...
name2.fits data1[1] err1[1] data1[2] err1[2] data1[3] err1[3]...

所以下一个文件是

name1.fits data2[1] err2[1] data2[2] err2[2] data2[3] err2[3]...
name2.fits data2[1] err2[1] data2[2] err2[2] data2[3] err2[3]...

然后我的数据的第一个新文件看起来像:

n1_Cam_A_120213_O.fits 4101.77  0.03223 1.     0. -3.5612  0.05278 3.561 0.05278 -0.278635 0.00237 4.707   0.4393  6.448 0.4125
n1_Cam_B_120213_O.fits 4101.767 0.0293  0.9999 0. -4.57791 0.03447 4.578 0.03447 -0.388646 0.00243 0.03091 0.00873 7.499 0.07529

这是我到目前为止所做的:

with open('file.dat','r') as data, open('names.txt', 'a') as nam, open('values.txt', 'a') as val, open('errors.txt', 'a') as err:
    for lines in data.readlines():
        cols = lines.split()    
        if "fits" in lines:
            header = lines.split()
            nam.write(header[3])
        elif float(cols[0]) > 1:
            #print cols[0]
            x=str(cols)        
            val.write(x)
        elif float(cols[0]) < 1:
            #print cols[0]
            y=str(cols)
            err.write(y)                            

我刚开始使用python。我想到了不同文件中的单独名称值和错误,然后选择我需要的行和列。但由于我将处理数百个行和文件,我想要一种更自动的方法。我想要的是读取前3行并写入file1,然后是行1,4,5并写入file2,然后是行1,6,7并写入file3,然后是行1,8,9并写入file4,然后跳过空白行和读取行11,12,13,然后写入file1,然后是行11,14,15和写入文件2,依此类推(或类似的东西)。

2 个答案:

答案 0 :(得分:0)

请尝试以下代码,这是您需要的吗?

结果命名方案为0,1,2,...

first = False
for i in open('file.dat'):
    i = i.strip()
    if not i:
        continue

    if 'fits' in i:
        name = i.split()[3][1:-2]
        data = None
        first = not first
        if first:
            out = []

        cur = -1
    else:
        if not data:
            data = i.split()
            if first:
                out.append(open('%d' % len(out), 'w'))
            else:
                cur += 1

            out[cur].write(name)
        else:
            for d, e in zip(data, i.split()):
                out[cur].write(' %s %s' % (d, e))

            out[cur].write('\n')
            data = None

答案 1 :(得分:0)

我终于设法让它发挥作用,但也许你可以给我一些提示,让它变得更好。这是:

with open('file.log','r') as data, open('out1.txt', 'w') as out1, open('out2.txt', 'w') as out2:


    def readError(error): # original data had errors inside parenthesis
        newError = []
        for e in error:
            e = e.replace('(','').replace(')','')            
            e = e.split()
            newError.extend(e)
        return newError

    for lines in open('file.log','r'):    
        lines = data.readline()
        cols = lines.strip().split()    
        if "fits" in lines:   
            name = cols[3].replace('[','').replace(']','').replace(':','') + ' ' + '0' + ' ' + '1'
            # 0 and 1 were some indexes I needed to add to each line

            for i in range(0,2):   # I needed to write 2 files
                dato = data.readline().strip().split()
                error_dato = readError(data.readline().strip().split())
                newline = ''
                if i == 0:
                    newline = name
                    for j in range(0, 7):   # data had 7 columns
                        newline = newline + ' ' + dato[j] + ' ' + error_dato[j]
                    print newline
                    out1.write(newline + '\n')
                elif i == 1:
                    newline = name
                    for j in range(0, 7):   
                        newline = newline + ' ' + dato[j] + ' ' + error_dato[j]
                    print newline   
                    out2.write(newline + '\n')

        linea   = data.readline().strip()   
        # I don't know why the code doesn't work without this line