我正在通过以下链接处理示例数据集。
https://www.kaggle.com/enirtium/gender-voice/data
我正在尝试打开.csv文件(也许我错误地打开它)并尝试创建完全连接的神经层。然后,我正在尝试训练他们,但不幸的是,我得到输入形状不适合问题。
" ValueError:检查输入时出错:期望的dense_1_input具有形状(无,2800)但是具有形状的数组(3168,1)"
我的代码如下:
import csv
import numpy
import string
from keras.models import Sequential
from sklearn.model_selection import train_test_split
import numpy as np
from keras import models
from keras import layers
path = r'/Users/username/Desktop/voice.csv'
meanfreq = []
sd = []
median = []
label = []
with open(path, 'r') as csv_file:
csv_reader = csv.reader(csv_file)
next(csv_reader)
for line in csv_reader:
#print(line['meanfreq'])
meanfreq.append(line[0])
sd.append(line[1])
median.append(line[2])
if line[20] == "female":
label.append(1)
else:
label.append(0)
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(2800,)))
network.add(layers.Dense(1, activation='sigmoid'))
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
network.fit(meanfreq, label, epochs=5, batch_size=128)
scores = network.evaluate(meanfreq, label)
print("\n%s: %.2f%%" % (network.metrics_names[1], scores[1]*100))
我想也许,我无法打开.csv文件(它正在打开" list" primitive)或者还有其他任何问题。不幸的是,我是神经网络和python的新人。我将打开这个csv文件,并将使用其%70数据来训练%30数据进行测试。
答案 0 :(得分:0)
阅读数据似乎没问题。
我想你有一个看起来像的数据集:
mean_freq, label
.12 0
.45 1
你想训练一个分类器。目前该模型是期待的
一个具有2800个功能的训练示例。 input shape=(2800,)
,但您只需要1个功能:mean_freq
这里的错误是你试图告诉Keras在声明模型时要使用多少训练样例。你不能在这里做到这一点,以后当你拟合模型时,你会做到这一点。
因此,input_shape
到keras的Dense
图层应为(1,)单个要素。如果您要使用均值和中值频率,那么您需要两个功能(2,)等等。
# note change from 2800 to 1
network.add(layers.Dense(512, activation='relu', input_shape=(1,)))
您可以通过多种方式分割训练和测试集。我的建议是做这样的事情:
train_size = 2800
X_train = mean_freq[:train_size]
y_train = label[:train_size]
X_test = mean_freq[train_size:]
y_test = label[:train_size]
然后使用训练集拟合模型,并使用测试集进行评分。
network.fit(X_train, y_train, epochs=5, batch_size=128)
scores = network.evaluate(X_test, y_test)
编辑以反映评论:
如果案例是你训练数据有20个功能那么 你告诉keras:
# note change from 2800 to 1
network.add(layers.Dense(512, activation='relu', input_shape=(20,)))
您已经完成了将数据转换为培训和测试所需形状所需的工作,但上面的模板是您如何适应和评估模型的。
我还要注意,如果你要进行建模(如你所知),有更好的方法读取csv数据。看看使用pandas dataframe。
创建火车和测试分割的更好(更标准的方法):查看sklearn' s train_test_split
编辑2:语音数据的快速模型
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from keras.model import Model
from keras.layers import Dense, Input
# get data ready
data = pd.read_csv('voice.csv')
data.shape
# split out features and label
X = data.iloc[:, :-1].values
y = data.iloc[:, -1]
# map category to binary
y = np.where(y == 'male', 1, 0)
enc = OneHotEncoder()
# reshape y to be column vector
y_ = enc.fit_transform(y.reshape(-1, 1)).toarray()
X_train, X_test, y_train, y_test = train_test_split(
X, y_, train_size=0.80, random_state=42)
# model using keras functional style
inp = Input(shape =(20, ))
dense = Dense(128)(inp)
out = Dense(2, activation='sigmoid')(dense)
model = Model(inputs=[inp], outputs=[out])
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, batch_size=128)
model.evaluate(X_test, y_test)
答案 1 :(得分:0)
是的,
它可以作为这些;
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
# get data ready
data = pd.read_csv('voice.csv')
data.shape
# split out features and label
X = data.iloc[:, :-1].values
y = data.iloc[:, -1]
# map category to binary
y = np.where(y == 'male', 1, 0)
enc = OneHotEncoder()
# reshape y to be column vector
y_ = enc.fit_transform(y.reshape(-1, 1)).toarray()
X_train, X_test, y_train, y_test = train_test_split(
X, y_, train_size=0.80, random_state=42)
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(20,)))
network.add(layers.Dense(2, activation='sigmoid'))
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
network.fit(X_train, y_train, epochs=100, batch_size=128)
network.evaluate(X_test, y_test)