我的函数仅从文件返回一行,而不是给出文件中的所有行

时间:2019-06-04 18:00:37

标签: python regex function

我编写了一个函数来读取.csv文件中的每一行以转换为数组。

我的文件的前三行

GTCAAGCATACCCCCGAGCATAGCCAGAGGCTAGTTCTACGCGGTGTAGGTGGCCGACAGCTTCGCGGCCCAAGGATGAGATCAGTAAACCCCGTTGGCAGAAATCTATGTTCATT
AGCCTGGTGCAGGTAGCGCAGCTGCTAAGGTCCCTATCGCGGTAGA
AACACTTGGTCCGACACAATTTTTTGTCTCTGCGAGTTTTGTGTGA

我编写的代码

import re
from sklearn.preprocessing import LabelEncoder
def test(logfile):
    with open(logfile) as f:
        for line in f:
            line = line.lower()
            line = re.sub('[^acgt]', 'z', line)
            my_array = np.array(list(line))
            label_encoder = LabelEncoder()
            label_encoder.fit(np.array(['a','c','g','t','z']))
            integer_encoded = label_encoder.transform(my_array)
            onehot_encoder = OneHotEncoder(sparse=False, dtype=int, n_values=5)
            integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
            onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
            onehot_encoded = np.delete(onehot_encoded, -1, 1)
            return onehot_encoded


它只返回文件的第一行,而不返回文件的所有行。您能帮我返回行中所有行的数组吗?

2 个答案:

答案 0 :(得分:2)

您的回报就在循环之内。因此它运行一次循环并返回

您需要在循环外部声明一个变量并将其追加到循环之外。 然后在循环之后返回填充的数组

import re
from sklearn.preprocessing import LabelEncoder
def test(logfile):
   out_arr = [] # <-- object to hold output.
   with open(logfile) as f:
       for line in f:
           line = line.lower()
           line = re.sub('[^acgt]', 'z', line)
           my_array = np.array(list(line))
           label_encoder = LabelEncoder()
           label_encoder.fit(np.array(['a','c','g','t','z']))
           integer_encoded = label_encoder.transform(my_array)
           onehot_encoder = OneHotEncoder(sparse=False, dtype=int, n_values=5)
           integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
           onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
           onehot_encoded = np.delete(onehot_encoded, -1, 1)
           out_arr.append(onehot_encoded) # <--- append instead of return
   return out_arr # <-- now that the loop is over, return the whole array

答案 1 :(得分:0)

注意缩进。 return语句位于while循环内,这意味着在第一个循环结束时它将返回并且该函数将完成。尝试这样的事情:

import re
from sklearn.preprocessing import LabelEncoder
def test(logfile):
    with open(logfile) as f:
        for line in f:
            line = line.lower()
            line = re.sub('[^acgt]', 'z', line)
            my_array = np.array(list(line))
            label_encoder = LabelEncoder()
            label_encoder.fit(np.array(['a','c','g','t','z']))
            integer_encoded = label_encoder.transform(my_array)
            onehot_encoder = OneHotEncoder(sparse=False, dtype=int, n_values=5)
            integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
            onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
            onehot_encoded = np.delete(onehot_encoded, -1, 1)
      return onehot_encoded