我训练两个GMM分类器,每个分类器用于一个标签,具有MFCC值。 我将一个类的所有MFCC值连接起来并装入分类器中。 并且对于每个分类器,我将其标签概率的概率加起来。
def createGMMClassifiers():
label_samples = {}
for label, sample in training.iteritems():
labelstack = np.empty((50,13))
for feature in sample:
#debugger.set_trace()
labelstack = np.concatenate((labelstack,feature))
label_samples[label]=labelstack
for label in label_samples:
#debugger.set_trace()
classifiers[label] = mixture.GMM(n_components = n_classes)
classifiers[label].fit(label_samples[label])
for sample in testing['happy']:
classify(sample)
def classify(testMFCC):
probability = {'happy':0,'sad':0}
for name, classifier in classifiers.iteritems():
prediction = classifier.predict_proba(testMFCC)
for probforlabel in prediction:
probability[name]+=probforlabel[0]
print 'happy ',probability['happy'],'sad ',probability['sad']
if(probability['happy']>probability['sad']):
print 'happy'
else:
print 'sad'
但是我的结果似乎并不一致,我发现很难相信它是由于RandomSeed = None状态,因为所有预测通常都是所有测试数据的相同标签,但每次运行它经常给出完全相反(参见输出1和输出2)。
所以我的问题是,我在训练分类器时做了一些明显错误的事情吗?
输出1:
happy 123.559202732 sad 122.409167294
happy
happy 120.000879032 sad 119.883786657
happy
happy 124.000069307 sad 123.999928962
happy
happy 118.874574047 sad 118.920941127
sad
happy 117.441353421 sad 122.71924156
sad
happy 122.210579428 sad 121.997571901
happy
happy 120.981752603 sad 120.325940128
happy
happy 126.013713257 sad 125.885047394
happy
happy 122.776016525 sad 122.12320875
happy
happy 115.064172476 sad 114.999513909
happy
输出2:
happy 123.559202732 sad 122.409167294
happy
happy 120.000879032 sad 119.883786657
happy
happy 124.000069307 sad 123.999928962
happy
happy 118.874574047 sad 118.920941127
sad
happy 117.441353421 sad 122.71924156
sad
happy 122.210579428 sad 121.997571901
happy
happy 120.981752603 sad 120.325940128
happy
happy 126.013713257 sad 125.885047394
happy
happy 122.776016525 sad 122.12320875
happy
happy 115.064172476 sad 114.999513909
happy
早些时候我问了一个相关的问题并得到了正确答案。我提供以下链接。
Having different results every run with GMM Classifier
编辑: 增加了收集数据并分为培训和测试的主要功能
def main():
happyDir = dir+'happy/'
sadDir = dir+'sad/'
training["sad"]=[]
training["happy"]=[]
testing["happy"]=[]
#TestSet
for wavFile in os.listdir(happyDir)[::-1][:10]:
#print wavFile
fullPath = happyDir+wavFile
testing["happy"].append(sf.getFeatures(fullPath))
#TrainSet
for wavFile in os.listdir(happyDir)[::-1][10:]:
#print wavFile
fullPath = happyDir+wavFile
training["happy"].append(sf.getFeatures(fullPath))
for wavFile in os.listdir(sadDir)[::-1][10:]:
fullPath = sadDir+wavFile
training["sad"].append(sf.getFeatures(fullPath))
#Ensure the number of files in set
print "Test(Happy): ", len(testing['happy'])
print "Train(Happy): ", len(training['happy'])
createGMMClassifiers()
编辑2: 根据答案更改了代码。仍然有类似的不一致结果。
答案 0 :(得分:0)
对于分类任务,调整给分类器的参数很重要,也有大量的分类算法遵循选择理论,这意味着如果你简单地改变模型的某些参数,你可能会得到一些巨大的不同结果。使用不同的算法也很重要,而不仅仅是将一种算法用于所有分类任务,
对于这个问题,你可以尝试不同的分类算法来测试你的数据是好的,并为每个分类器尝试不同的参数和不同的值,然后你就可以确定问题在哪里。
另一种方法是使用网格搜索来探索和调整特定分类器的最佳参数,请阅读:http://scikit-learn.org/stable/modules/grid_search.html
答案 1 :(得分:0)
您的代码没有多大意义,您可以为每个新的训练样本重新创建分类器。
正确的培训代码方案应该看看:
label_samples = {}
classifiers = {}
# First we collect all samples per label into array of samples
for label, sample in samples:
label_samples[label].concatenate(sample)
# Then we train classifier on every label data
for label in label_samples:
classifiers[label] = mixture.GMM(n_components = n_classes)
classifiers[label].fit(label_samples[label])
您的解码代码没问题。