我正在开发一个项目,我们一开始就必须过滤数据,以便删除无效数据。这意味着,如果我们加载的数据中的一行包含字母/单词,则必须将其删除。 我的下列代码是否足够这样做?
import numpy as np
def dataLoad(filename):
#The data is loaded and the variables are defined:
rawData=np.loadtxt('test.txt')
rawTemperature, rawGrowthrate, rawBacteria=np.loadtxt('test.txt',unpack=True)
print("You have choosen to work with the file {:s}".format(filename))
# Removeing unvalid data:
# Empty vector to save the invalid data in:
InvalidData=[]
# Vector with ones:
Erase=np.ones(len(rawData))
# The loop looks trough every datapoint in the matrix:
for i in range(len(rawData)):
# The rows in the Data that contains invalid data is inserted in Invalid Data
# And the ones in I'th place is switched to a zeroes.
if rawTemperature[i]<10 or rawTemperature[i]>60 or rawTemperature[i]==(""):
InvalidData.insert(i,'In line %d invalid Temperature' % (i+1))
Erase[i]=0
if rawGrowthrate[i]<0 or rawGrowthrate[i]==(""):
InvalidData.insert(i,'In line %d invalid Growth rate' % (i+1))
Erase[i]=0
if rawBacteria[i]<0 or rawBacteria[i]>4 or rawBacteria[i]==(""):
InvalidData.insert(i,'In line %d invalid Bacteria' % (i+1))
Erase[i]=0
答案 0 :(得分:0)
我不明白你是想删除整行还是只删除字母而不是数字或其他字符
检查某行是否包含字母或单词,您可以使用正则表达式 [a-zA-Z]
Regex to match only letters
https://docs.python.org/2/library/re.html
如果你想删除你可以使用的re.sub
字符,并用空格替换字符&#39;&#39;
import re
s = "ExampleString123"
replaced = re.sub('[a-zA-Z]', '', s)
print replaced
对于 numpy示例,请参阅Numpy array Regex sub
如果你想删除numpy数组中的整行,你可以使用正则表达式[a-zA-Z]
(Selecting elements in numpy array using regular expressions)选择它,然后删除它(deleting rows in numpy array)