我编写了一个函数来读取.csv文件中的每一行以转换为数组。
我的文件的前三行
GTCAAGCATACCCCCGAGCATAGCCAGAGGCTAGTTCTACGCGGTGTAGGTGGCCGACAGCTTCGCGGCCCAAGGATGAGATCAGTAAACCCCGTTGGCAGAAATCTATGTTCATT
AGCCTGGTGCAGGTAGCGCAGCTGCTAAGGTCCCTATCGCGGTAGA
AACACTTGGTCCGACACAATTTTTTGTCTCTGCGAGTTTTGTGTGA
我编写的代码
import re
from sklearn.preprocessing import LabelEncoder
def test(logfile):
with open(logfile) as f:
for line in f:
line = line.lower()
line = re.sub('[^acgt]', 'z', line)
my_array = np.array(list(line))
label_encoder = LabelEncoder()
label_encoder.fit(np.array(['a','c','g','t','z']))
integer_encoded = label_encoder.transform(my_array)
onehot_encoder = OneHotEncoder(sparse=False, dtype=int, n_values=5)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
onehot_encoded = np.delete(onehot_encoded, -1, 1)
return onehot_encoded
它只返回文件的第一行,而不返回文件的所有行。您能帮我返回行中所有行的数组吗?
答案 0 :(得分:2)
您的回报就在循环之内。因此它运行一次循环并返回
您需要在循环外部声明一个变量并将其追加到循环之外。 然后在循环之后返回填充的数组
import re
from sklearn.preprocessing import LabelEncoder
def test(logfile):
out_arr = [] # <-- object to hold output.
with open(logfile) as f:
for line in f:
line = line.lower()
line = re.sub('[^acgt]', 'z', line)
my_array = np.array(list(line))
label_encoder = LabelEncoder()
label_encoder.fit(np.array(['a','c','g','t','z']))
integer_encoded = label_encoder.transform(my_array)
onehot_encoder = OneHotEncoder(sparse=False, dtype=int, n_values=5)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
onehot_encoded = np.delete(onehot_encoded, -1, 1)
out_arr.append(onehot_encoded) # <--- append instead of return
return out_arr # <-- now that the loop is over, return the whole array
答案 1 :(得分:0)
注意缩进。 return语句位于while循环内,这意味着在第一个循环结束时它将返回并且该函数将完成。尝试这样的事情:
import re
from sklearn.preprocessing import LabelEncoder
def test(logfile):
with open(logfile) as f:
for line in f:
line = line.lower()
line = re.sub('[^acgt]', 'z', line)
my_array = np.array(list(line))
label_encoder = LabelEncoder()
label_encoder.fit(np.array(['a','c','g','t','z']))
integer_encoded = label_encoder.transform(my_array)
onehot_encoder = OneHotEncoder(sparse=False, dtype=int, n_values=5)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
onehot_encoded = np.delete(onehot_encoded, -1, 1)
return onehot_encoded