我正在努力将30秒音乐样本分为四种类型之一:["电子","嘻哈","爵士",&# 34;摇滚"]我会感激任何帮助。
我已经从mp3文件生成了自己的数据集。在我的"数据集"目录,我按流派排列了100首歌曲,每个流派有25个子目录(即我有#34;电子","嘻哈"等子目录)。
到目前为止,我已经提取了每个mp3文件的30秒样本,对信号进行归一化,使得没有样本超过-32或-18 dB,将它们混合为单声道,并将它们转换为wav(使用pyDub)。
接下来,我使用librosa为每首歌曲中的1292帧提取mfccs(mel频率倒谱系数)。然后我使用sklearn的预处理模块对数据进行缩放,使其具有零均值和单位方差。
我将这些值保存到MFFC的csv文件中,其中每行是一个帧,每列是12个系数之一。
所以你可以得到一个想法,这首歌后面是前20帧: Julio Bashmore - Au Seve.mp3
1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
2.4841536110972022,-0.5831476248573247,0.37058328670683277,-1.4599220579565508,-0.35449671920732007,-1.1326787825224918,0.5880762356956317,-0.8108172607843107,0.010134004741811507,-0.14931018884055094,0.8707111843072819,0.1667143116197902
2.0939135826765907,-0.4778720879089441,0.26530387765936375,-1.7076132053582773,0.11305806361775678,-1.310823349961563,1.1669240812438573,-0.8627333493359391,-0.19252158214293175,-0.039523355794829566,0.6658161856594883,0.2860711396454278
1.0127547898943148,-0.6547501371081066,-0.202002065081406,-1.7889468252345162,1.1632837017143651,-0.9288351063974712,2.070078331574107,-0.7601750354687623,-0.27909671541985936,-0.13713166210030908,0.2267359005199065,0.27808482310773774
0.06572087004052392,-1.8740496505118946,-0.9604185325425617,-1.0163364869696865,1.5840872642483552,-0.16659361108422382,1.7806813371087853,0.055159751832777354,0.6842054675590546,0.42350598071605017,-0.3324771084186967,-0.24348528197848257
0.7159690101152768,-2.6235135217332606,-0.9099658866643047,-0.19653348650619468,1.0348534863167884,-0.6771927176675163,1.0703663687805878,0.3981886714210787,0.8503521825769755,0.4055860454830591,-0.11841556456925736,0.05030541244676532
0.8810398765824345,-2.7727001749452045,-0.8484274387283207,0.14839104995756489,0.9124992899968386,-0.5987705973726993,0.6471053665081234,0.43190059553550836,0.9028748015921237,0.3425604687141461,-0.20209176692016032,0.15561852907964296
1.5217565976091,-2.5946551685896044,-0.3924558895014341,0.36743931340001096,0.9126773048246598,-0.7581004315396501,0.4463892360730688,0.42969123923287,0.7276796949470707,-0.0079165602005986,-0.580154306587985,-0.07235102966750707
2.08861621898524,-1.2804976691396324,0.46640912894919145,0.14007051920782673,0.9100754665932002,-1.51168507329552,0.7161640071116147,0.34780954351977644,0.30123647629161765,-1.103443008391695,-0.7900432022174468,-0.2847124076141728
1.300078728794466,-0.6136862665584394,0.5321920666343034,-0.25881789165042973,1.2648582642185016,-1.7504670292559645,1.4050993480861744,0.354988549965
我尝试简单地将每首歌曲的1292帧矢量编译成每个类型的一个大矢量,并将其用作scikitlearn的kNN算法的输入。我得到了非常不幸的结果,我只是得到一个充满了"摇滚"
的向量我相当肯定我根本没有正确地去做,但是 我有以下两个功能。第一个为每个类型创建此特征向量。第二个只是使用每个类型向量和一个填充了该类型标签的向量来训练它。
def create_np_vector(db_read, start_row):
num_frames = 1292
num_songs = 25
num_coefs = 12
#vector for all features of every song/sample of that genre
genre_vec = np.empty([(num_frames * num_songs), num_coefs])
db_reader = csv.reader(db_read)
for row in itertools.islice(db_reader, start_row, start_row + 25):
id = row[0]
path = row[2]
mfcc_file = path + "/csv/" + id + ".csv"
mfcc_reader = csv.reader(open(mfcc_file, 'r'))
frame_num = 0
for mfcc_row in mfcc_reader:
frame_vec = np.array(mfcc_row)
genre_vec[frame_num] = frame_vec
frame_num = frame_num + 1
return genre_vec
def train_knn(db_read, knn):
genres = ["electronic", "hip hop", "jazz", "rock"]
line_num = [0, 26, 51, 76]
x = 0
for genre in genres:
vec = create_np_vector(db_read, line_num[x])
x = x + 1
print(genre)
knn.fit(vec, [genre for x in range(25*1292)])
我到底应该做什么?我一直试图将此作为资源使用:http://modelai.gettysburg.edu/2012/music/,但我还是输了。我应该为每个音频文件计算平均向量和协方差矩阵吗?
即使我这样做了,我会对每个文件使用两个向量做什么?
答案 0 :(得分:0)
MFCC对于首发来说是一个很好的方法。但是,很可能有12个系数会压倒你的分类器。采取均值,你的表现肯定会提高。 Here是Luis Pedro Coelho和Willi Richert在Building Machine Learning Systems with Python中所描述的完整解决方案。
您可以从网站获得免费试用版。足以阅读关于这一章的章节:音乐类型分类。
答案 1 :(得分:0)
每个类只有25个样本,每个样本只有1292个特征,很难发现其中的任何模式,数据太嘈杂。为了降低复杂度,您可以尝试计算MFCC数据的摘要统计信息,例如每个频段的均值/标准差/最小值/最大值/偏斜/峰度。为了捕获某些时间变化,您可能还希望包括MFCC帧增量的摘要。
MFCC功能最初是为语音而设计的。您可能想尝试使用一些更适合音乐的功能。 libessentia和librosa都具有用于节奏,音乐键等功能的特征提取器。