我希望将生成器数据拆分为训练和测试,而不转换为密集数据以减少RAM消耗。
import operator
import random
ops = {'+': operator.add,
'-': operator.sub} # add mul and div if you wish
keys_tuple = tuple(ops.keys())
while True:
num_a = random.randint(1, 10) # use larger range if you wish
num_b = random.randint(1, 10) # use larger range if you wish
op = random.choice(keys_tuple)
print('{}{}{}=?'.format(num_a, op, num_b))
expected_answer = ops[op](num_a, num_b)
user_answer = int(input())
if user_answer == expected_answer:
print('Correct')
else:
print('Wrong')
然而import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
# Data set
ds = np.array([
('Alice', 0),
('Bob', 1),
('Charlie', 1),
])
x = ds[:, 0]
y = ds[:, 1]
# Change texts into numeric vectors
max_sequence = max(x, key=len)
vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(len(max_sequence))
text_processed = vocab_processor.fit_transform(x)
print(type(text_processed)) # <class 'generator'>
# Split into training and test
x_train, \
x_test, \
y_train, \
y_test = train_test_split(text_processed, y)
抱怨:
train_test_split
TypeError: Singleton array array(<generator object VocabularyProcessor.transform at 0x116f6f830>, dtype=object) cannot be considered a valid collection`
?假设我正在尝试使用更多数据和更长文本的this spam/ham tutorial。