Question

我正在开发一个项目，我们一开始就必须过滤数据，以便删除无效数据。这意味着，如果我们加载的数据中的一行包含字母/单词，则必须将其删除。我的下列代码是否足够这样做？

import numpy as np
def dataLoad(filename):
#The data is loaded and the variables are defined:
    rawData=np.loadtxt('test.txt')
    rawTemperature, rawGrowthrate, rawBacteria=np.loadtxt('test.txt',unpack=True)
    print("You have choosen to work with the file {:s}".format(filename))
    # Removeing unvalid data:
    # Empty vector to save the invalid data in:
    InvalidData=[]
    # Vector with ones:
    Erase=np.ones(len(rawData))

    # The loop looks trough every datapoint in the matrix:
    for i in range(len(rawData)):
        # The rows in the Data that contains invalid data is inserted in Invalid Data
        # And the ones in I'th place is switched to a zeroes.
        if rawTemperature[i]<10 or rawTemperature[i]>60 or rawTemperature[i]==(""):
            InvalidData.insert(i,'In line %d invalid Temperature' % (i+1))
            Erase[i]=0
        if rawGrowthrate[i]<0 or rawGrowthrate[i]==(""):
            InvalidData.insert(i,'In line %d invalid Growth rate' % (i+1))
            Erase[i]=0
        if rawBacteria[i]<0 or rawBacteria[i]>4 or rawBacteria[i]==(""):
            InvalidData.insert(i,'In line %d invalid Bacteria' % (i+1))            
            Erase[i]=0

Answer 1

我不明白你是想删除整行还是只删除字母而不是数字或其他字符

检查某行是否包含字母或单词，您可以使用正则表达式 [a-zA-Z] Regex to match only letters

https://docs.python.org/2/library/re.html

如果你想删除你可以使用的re.sub字符，并用空格替换字符＆＃39;＆＃39;

import re
s = "ExampleString123"
replaced = re.sub('[a-zA-Z]', '', s)
print replaced

对于 numpy示例，请参阅Numpy array Regex sub

如果你想删除numpy数组中的整行，你可以使用正则表达式[a-zA-Z]（Selecting elements in numpy array using regular expressions）选择它，然后删除它（deleting rows in numpy array）

如何删除包含字母或单词的行

1 个答案: