使用Python编码的性别语音

时间:2018-03-29 16:26:30

标签: python neural-network keras

我正在通过以下链接处理示例数据集。

https://www.kaggle.com/enirtium/gender-voice/data

我正在尝试打开.csv文件(也许我错误地打开它)并尝试创建完全连接的神经层。然后,我正在尝试训练他们,但不幸的是,我得到输入形状不适合问题。

" ValueError:检查输入时出错:期望的dense_1_input具有形状(无,2800)但是具有形状的数组(3168,1)"

我的代码如下:

import csv
import numpy
import string

from keras.models import Sequential
from sklearn.model_selection import train_test_split
import numpy as np

from keras import models
from keras import layers

path = r'/Users/username/Desktop/voice.csv'

meanfreq = []
sd = []
median = []
label = []

with open(path, 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    next(csv_reader)

    for line in csv_reader:
        #print(line['meanfreq'])
        meanfreq.append(line[0])
        sd.append(line[1])
        median.append(line[2])

        if line[20] == "female":
            label.append(1)
        else:
            label.append(0)   

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(2800,)))
network.add(layers.Dense(1, activation='sigmoid'))

network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

network.fit(meanfreq, label, epochs=5, batch_size=128)
scores = network.evaluate(meanfreq, label)
print("\n%s: %.2f%%" % (network.metrics_names[1], scores[1]*100))

我想也许,我无法打开.csv文件(它正在打开" list" primitive)或者还有其他任何问题。不幸的是,我是神经网络和python的新人。我将打开这个csv文件,并将使用其%70数据来训练%30数据进行测试。

2 个答案:

答案 0 :(得分:0)

阅读数据似乎没问题。

我想你有一个看起来像的数据集:

mean_freq, label
.12         0
.45         1

你想训练一个分类器。目前该模型是期待的 一个具有2800个功能的训练示例。 input shape=(2800,),但您只需要1个功能:mean_freq

这里的错误是你试图告诉Keras在声明模型时要使用多少训练样例。你不能在这里做到这一点,以后当你拟合模型时,你会做到这一点。

因此,input_shape到keras的Dense图层应为(1,)单个要素。如果您要使用均值和中值频率,那么您需要两个功能(2,)等等。

# note change from 2800 to 1
network.add(layers.Dense(512, activation='relu', input_shape=(1,)))

您可以通过多种方式分割训练和测试集。我的建议是做这样的事情:

train_size = 2800
X_train = mean_freq[:train_size]
y_train = label[:train_size]
X_test = mean_freq[train_size:]
y_test = label[:train_size]

然后使用训练集拟合模型,并使用测试集进行评分。

network.fit(X_train, y_train, epochs=5, batch_size=128)
scores = network.evaluate(X_test, y_test)

编辑以反映评论:

如果案例是你训练数据有20个功能那么 你告诉keras:

# note change from 2800 to 1
network.add(layers.Dense(512, activation='relu', input_shape=(20,)))

您已经完成了将数据转换为培训和测试所需形状所需的工作,但上面的模板是您如何适应和评估模型的。

我还要注意,如果你要进行建模(如你所知),有更好的方法读取csv数据。看看使用pandas dataframe。 创建火车和测试分割的更好(更标准的方法):查看sklearn' s train_test_split

编辑2:语音数据的快速模型

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from keras.model import Model
from keras.layers import Dense, Input

# get data ready
data = pd.read_csv('voice.csv')
data.shape
# split out features and label
X = data.iloc[:, :-1].values
y = data.iloc[:, -1]
# map category to binary
y = np.where(y == 'male', 1, 0)
enc = OneHotEncoder()
# reshape y to be column vector
y_ = enc.fit_transform(y.reshape(-1, 1)).toarray()
X_train, X_test, y_train, y_test = train_test_split(
              X, y_,  train_size=0.80, random_state=42)

# model using keras functional style
inp = Input(shape =(20, ))
dense = Dense(128)(inp)
out = Dense(2, activation='sigmoid')(dense)
model = Model(inputs=[inp], outputs=[out])
model.compile(loss='binary_crossentropy', optimizer='adam', 
             metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, batch_size=128)
model.evaluate(X_test, y_test)

答案 1 :(得分:0)

是的,

它可以作为这些;

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

# get data ready
data = pd.read_csv('voice.csv')
data.shape
# split out features and label
X = data.iloc[:, :-1].values
y = data.iloc[:, -1]
# map category to binary
y = np.where(y == 'male', 1, 0)
enc = OneHotEncoder()
# reshape y to be column vector
y_ = enc.fit_transform(y.reshape(-1, 1)).toarray()
X_train, X_test, y_train, y_test = train_test_split(
              X, y_,  train_size=0.80, random_state=42)

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(20,)))
network.add(layers.Dense(2, activation='sigmoid'))

network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])


network.fit(X_train, y_train, epochs=100, batch_size=128)
network.evaluate(X_test, y_test)