对于Keras来说是新手,尝试打印形状时遇到了问题,因此可以将其用作input_shape。到目前为止,这是我的代码:
df = pd.read_csv(pathname, encoding = "ISO-8859-1")
df = df[['content_cleaned', 'meaningful']]
df = df.sample(frac=1) #Shuffling the data
X = np.asarray(df[['content_cleaned']])
y = np.asarray(df[['meaningful']])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21)
tokenizer = Tokenizer()
X_train = keras.preprocessing.text.Tokenizer(num_words=100)
X_test = keras.preprocessing.text.Tokenizer(num_words=100)
encoder = LabelBinarizer()
encoder.fit(y_train)
y_train = encoder.transform(y_train)
encoder.fit(y_test)
y_test = encoder.transform(y_test)
print(X_train.shape)
代码在最终的打印语句中失败。错误消息:
AttributeError: 'Tokenizer' object has no attribute 'shape'
再次,我对此很陌生,似乎无法弄清楚如何克服此错误。任何帮助都会很棒!
编辑:我对代码进行了一些修改,以尝试实现其他用户的建议。这是代码(已更改):
# Create tokenizer
tokenizer = Tokenizer(num_words=100) #No row has more than 100 words.
#Tokenize the predictors (text)
X_train = tokenizer.sequences_to_matrix(X_train, mode="binary")
X_test = tokenizer.sequences_to_matrix(X_test, mode="binary")
在声明X_train变量时失败。错误消息是:
TypeError: '>=' not supported between instances of 'str' and 'int'
编辑2:进行以下更改,代码将运行。当我运行print命令时,什么都没打印:
X_train = tokenizer.sequences_to_matrix(int(input(X_train)), mode="binary")
X_test = tokenizer.sequences_to_matrix(int(input(X_test)), mode="binary")
答案 0 :(得分:0)
我相信这是因为尽管您首先将其设置为numpy数组...
from django.utils.dateparse import parse_date
converted_birthday = parse_date(birthdate)
...并提供数据...
X = np.asarray(df[['content_cleaned']])
...然后,将其设为Tokenizer对象,该对象显然没有'shape'属性。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21)