我有一个这样的数据框:
0 1 2
filename CF02_B1_D1_M3.wav F02_B2_D1_M2.wav F02_B3_D6_M3.wav
datatype train train test
label 1 1 6
feature0 18.2796 17.8995 18.0531
feature1 -3.92135 -15.5039 -31.0344
feature2 13.6118 -0.741729 7.87929
feature3 -7.25019 -0.0536188 -18.6119
feature4 -11.7736 -6.73465 0.682173
feature5 -18.265 4.39842 -5.3771
以下是代码:
from python_speech_features import mfcc
import scipy.io.wavfile as wav
import numpy as np
import os
import pandas as pd
filenames, datatype, labels = [], [], []
fitur = [[] for i in range(2392)]
path = "C:\Users\HEWLETT PACKARD"
for item in os.listdir(path):
if item.endswith('.wav'):
parts = item.split('_')
(rate, sig) = wav.read(item)
mfcc_feat = mfcc(sig, rate, nfilt=26, numcep=13)
feat = np.asarray(mfcc_feat[:, :])
feature = feat.ravel()
if parts[1][1] == '3':
data_type='test'
label=parts[2][1]
else:
data_type='train'
label=parts[2][1]
filenames.append(item)
datatype.append(data_type)
labels.append(label)
for i in range(2392):
fitur[i].append(np.squeeze(feature[i]))
dataset = [filenames, datatype, labels]
dataset.extend(fitur)
column = ['filename', 'datatype', 'label']
column.extend(['feature'+str(i) for i in range(2392)])
dataset = [(col, val) for col, val in zip(column, dataset)]
df = pd.DataFrame.from_items(dataset, columns = column)
df = df.transpose()
print df
我提取了一些wav文件的功能。我把它们分成了火车和测试数据。然后我把它们放在一个数据帧中。 如何选择“train”作为其数据类型的数据?
答案 0 :(得分:2)
IIUC:
In [24]: df.loc['feature0':, df.columns[df.loc['datatype']=='train']]
Out[24]:
0 1
feature0 18.2796 17.8995
feature1 -3.92135 -15.5039
feature2 13.6118 -0.741729
feature3 -7.25019 -0.0536188
feature4 -11.7736 -6.73465
feature5 -18.265 4.39842
feature6 -18.1045 -1.88591
feature7 -10.3347 -12.4131
feature8 -15.5189 1.84178
feature9 -13.8793 -2.21513
feature10 -11.2372 -14.6925
feature11 -13.1699 7.65947
feature12 -13.2874 3.11805
feature13 18.529 17.9096
您可能还希望对其进行转置以使其更适合机器学习:
In [36]: col_mask = df.loc['datatype']=='train'
In [37]: col_mask
Out[37]:
0 True
1 True
2 False
Name: datatype, dtype: bool
In [38]: df.loc['feature0':, df.columns[col_mask]].T.set_index(df.loc['filename'][col_mask])
Out[38]:
feature0 feature1 feature2 feature3 feature4 ... feature9 feature10 feature11 feature12 feature13
filename ...
CF02_B1_D1_M3.wav 18.2796 -3.92135 13.6118 -7.25019 -11.7736 ... -13.8793 -11.2372 -13.1699 -13.2874 18.529
F02_B2_D1_M2.wav 17.8995 -15.5039 -0.741729 -0.0536188 -6.73465 ... -2.21513 -14.6925 7.65947 3.11805 17.9096
[2 rows x 14 columns]
答案 1 :(得分:1)
我相信你需要:
df = df.loc[:, df.loc['datatype'] == 'train']
print (df)
1 2
filename CF02_B1_D1_M3.wav F02_B2_D1_M2.wav
datatype train train
label 1 1
feature0 18.2796 17.8995
feature1 -3.92135 -15.5039
feature2 13.6118 -0.741729
feature3 -7.25019 -0.0536188
feature4 -11.7736 -6.73465
feature5 -18.265 4.39842
feature6 -18.1045 -1.88591
feature7 -10.3347 -12.4131
feature8 -15.5189 1.84178
feature9 -13.8793 -2.21513
feature10 -11.2372 -14.6925
feature11 -13.1699 7.65947
feature12 -13.2874 3.11805
feature13 18.529 17.9096
然后如果需要按名称删除前3行:
df = df.drop(['filename','datatype','label'])
或者通过职位:
df = df.iloc[3:]
print (df)
1 2
feature0 18.2796 17.8995
feature1 -3.92135 -15.5039
feature2 13.6118 -0.741729
feature3 -7.25019 -0.0536188
feature4 -11.7736 -6.73465
feature5 -18.265 4.39842
feature6 -18.1045 -1.88591
feature7 -10.3347 -12.4131
feature8 -15.5189 1.84178
feature9 -13.8793 -2.21513
feature10 -11.2372 -14.6925
feature11 -13.1699 7.65947
feature12 -13.2874 3.11805
feature13 18.529 17.9096