我正在尝试定义一个打开txt文件的函数,使其成为N*3
矩阵。但是,在继续读取这些行之前,应跳过不符合某些条件的行,并显示一条错误消息。
这是我的代码:
import numpy as np
def dataLoad(filename):
data=np.loadtxt(filename)
return data
所以我有矩阵,但是现在我想逐行读取它,在不满足条件时跳过,显示一条错误消息,说明错误是什么以及发生在哪行,然后继续。
条件是:
•第一列必须是10到60之间的数字。
•第二列必须为正数。
•第三列必须为1,2,3或4。
我尝试过:
import numpy as np
def dataLoad(filename):
data=np.loadtxt(filename)
for row in data:
if (row[0] < 10) or (row[0] > 60):
print("Temperature is out of range")
continue
elif (row[1]<0):
print("Growth rate is negative")
continue
elif (row[2]!=1) or (row[2]!=2) or (row[2]!=3) or (row[2]!=4):
print("Bacteria is not 1, 2, 3 or 4")
continue
return data
但是它在开始时给了我所有错误消息,然后是所有行,包括应排除的行。
我还尝试使用以下命令逐行读取文件:
data = open("test.txt", "r")
line = data.readline()
if (line[0] < 10) or (line[0] > 60):
print("Temperature is out of range")
elif (line[1]<0):
print("Growth rate is negative")
elif (line[2]!=1) or (line[2]!=2) or (line[2]!=3) or (line[2]!=4):
print("Bacteria is not 1, 2, 3 or 4")
我知道它不会删除行,但至少我希望它会为我提供正确行的错误消息。但它返回:
if (line[0] < 10) or (line[0] > 60):
TypeError: '<' not supported between instances of 'str' and 'int'
答案 0 :(得分:0)
我认为这是您需要的代码。让我知道我是否做错了。
import numpy as np
def dataLoad(filename):
#opening and reading the data file
file=open(filename,'r').readline()
#creating the numpy matrix object
data=np.matrix(file)
print('raw data:',data)
#checking the conditions
#... for first column
i=0
while(i<len(data)):
if data.item(i,0)<10 or 60<data.item(i,0):
print('error message')
print('removed',data[i])
data=np.delete(data,(i),axis=0)
else:
i+=1
print('data after checking 1st condition:',data)
#... for second column
i=0
while(i<len(data)):
if data.item(i,1)<0:
print('error message')
print('removed',data[i])
data=np.delete(data,(i),axis=0)
else:
i+=1
print('data after checking the 2nd condition:',data)
#... for third column
i=0
while(i<len(data)):
if data.item(i,2) not in (1,2,3,4):
print('error message')
print('removed',data[i])
data=np.delete(data,(i),axis=0)
else:
i+=1
print('data after checking the 3rd condition:',data)
return data
print(dataLoad('test.txt'))
答案 1 :(得分:0)
我认为您的问题是,一旦满足条件,您就不会对数据进行变异。像下面的代码这样的东西应该可以帮助您解决问题:
import numpy as np
def dataLoad(filename):
data=np.loadtxt(filename)
retdata = []
for row in data:
if (row[0] < 10) or (row[0] > 60):
print("Temperature is out of range")
continue
elif (row[1]<0):
print("Growth rate is negative")
continue
elif (row[2]!=1) or (row[2]!=2) or (row[2]!=3) or (row[2]!=4):
print("Bacteria is not 1, 2, 3 or 4")
continue
retdata.append(row)
return retdata
希望有帮助。
答案 2 :(得分:0)
it is me again:) this code is the fixed verion of the Edit 2 in question description:
data = open("test.txt", "r").readlines()
for raw_line in range(len(data)):
line = [int(n) for n in data[raw_line].split(' ')]# this splits the "1 2 3" to ['1','2','3'] and then, makes them integer([1,2,3])
if (line[0] < 10) or (line[0] > 60):
print("Temperature is out of range in row",raw_line)
elif (line[1]<0):
print("Growth rate is negative in row",raw_line)
elif (line[2]!=1) or (line[2]!=2) or (line[2]!=3) or (line[2]!=4):
print("Bacteria is not 1, 2, 3 or 4 in row",raw_line)
PS: I assume that each line of test.txt's format is exactly "a b c" where a,b and c are numbers. if it's different, let me know to fix it
PPS: as you know, this code won't remove the invalid rows and just prints the error messages