逐行读取并在使用numpy的情况下跳过

时间:2019-01-18 13:33:36

标签: python python-3.x numpy

我正在尝试定义一个打开txt文件的函数,使其成为N*3矩阵。但是,在继续读取这些行之前,应跳过不符合某些条件的行,并显示一条错误消息。

这是我的代码:

import numpy as np
def dataLoad(filename):
    data=np.loadtxt(filename)
    return data

所以我有矩阵,但是现在我想逐行读取它,在不满足条件时跳过,显示一条错误消息,说明错误是什么以及发生在哪行,然后继续。

条件是:
•第一列必须是10到60之间的数字。
•第二列必须为正数。
•第三列必须为1,2,3或4。

编辑

我尝试过:

import numpy as np
def dataLoad(filename):
    data=np.loadtxt(filename)
    for row in data:
        if (row[0] < 10) or (row[0] > 60):
            print("Temperature is out of range")
            continue
        elif (row[1]<0):
            print("Growth rate is negative")
            continue
        elif (row[2]!=1) or (row[2]!=2) or (row[2]!=3) or (row[2]!=4):
            print("Bacteria is not 1, 2, 3 or 4")
            continue
    return data

但是它在开始时给了我所有错误消息,然后是所有行,包括应排除的行。

编辑2

我还尝试使用以下命令逐行读取文件:

data = open("test.txt", "r")
line = data.readline()
if (line[0] < 10) or (line[0] > 60):
    print("Temperature is out of range")
elif (line[1]<0):
    print("Growth rate is negative")
elif (line[2]!=1) or (line[2]!=2) or (line[2]!=3) or (line[2]!=4):
    print("Bacteria is not 1, 2, 3 or 4")  

我知道它不会删除行,但至少我希望它会为我提供正确行的错误消息。但它返回:

 if (line[0] < 10) or (line[0] > 60):

TypeError: '<' not supported between instances of 'str' and 'int'

3 个答案:

答案 0 :(得分:0)

我认为这是您需要的代码。让我知道我是否做错了。

import numpy as np

def dataLoad(filename):
    #opening and reading the data file
    file=open(filename,'r').readline()
    #creating the numpy matrix object
    data=np.matrix(file)

    print('raw data:',data)

    #checking the conditions
    #... for first column
    i=0
    while(i<len(data)):
        if data.item(i,0)<10 or 60<data.item(i,0):
            print('error message')
            print('removed',data[i])
            data=np.delete(data,(i),axis=0)
        else:
            i+=1

    print('data after checking 1st condition:',data)

    #... for second column
    i=0
    while(i<len(data)):
        if data.item(i,1)<0:
            print('error message')
            print('removed',data[i])
            data=np.delete(data,(i),axis=0)
        else:
            i+=1

    print('data after checking the 2nd condition:',data)

    #... for third column
    i=0
    while(i<len(data)):
        if data.item(i,2) not in (1,2,3,4):
            print('error message')
            print('removed',data[i])
            data=np.delete(data,(i),axis=0)
        else:
            i+=1

    print('data after checking the 3rd condition:',data)
    return data

print(dataLoad('test.txt'))

答案 1 :(得分:0)

我认为您的问题是,一旦满足条件,您就不会对数据进行变异。像下面的代码这样的东西应该可以帮助您解决问题:

import numpy as np
def dataLoad(filename):
    data=np.loadtxt(filename)
    retdata = []
    for row in data:
        if (row[0] < 10) or (row[0] > 60):
            print("Temperature is out of range")
            continue
        elif (row[1]<0):
            print("Growth rate is negative")
            continue
        elif (row[2]!=1) or (row[2]!=2) or (row[2]!=3) or (row[2]!=4):
            print("Bacteria is not 1, 2, 3 or 4")
            continue
    retdata.append(row)
    return retdata

希望有帮助。

答案 2 :(得分:0)

it is me again:) this code is the fixed verion of the Edit 2 in question description:

data = open("test.txt", "r").readlines()
for raw_line in range(len(data)):
    line = [int(n) for n in data[raw_line].split(' ')]# this splits the "1 2 3" to ['1','2','3'] and then, makes them integer([1,2,3])
    if (line[0] < 10) or (line[0] > 60):
        print("Temperature is out of range in row",raw_line)
    elif (line[1]<0):
        print("Growth rate is negative in row",raw_line)
    elif (line[2]!=1) or (line[2]!=2) or (line[2]!=3) or (line[2]!=4):
        print("Bacteria is not 1, 2, 3 or 4 in row",raw_line) 

PS: I assume that each line of test.txt's format is exactly "a b c" where a,b and c are numbers. if it's different, let me know to fix it

PPS: as you know, this code won't remove the invalid rows and just prints the error messages