我一直在使用Python中的流音频指纹识别器,并尝试使用几个不同的库/服务(Gracenote,AcoustID,ACRCloud),但无法正常工作。
我当前的代码使用python-sounddevice以float32格式通过我的麦克风(或声卡减少噪音)录制原始音频,并填充一个环形缓冲区,然后我每隔5秒读取一次,转换为PCM 16位,提供acoustid指纹功能。
我的代码:
import sounddevice as sd
import time as time2
duration = 15
b = Buffer(duration * RATE)
RATE = 16000
run_time = 60
class Buffer(object):
def __init__(self, size, dtype=np.float32):
self.size = size
self.buf = np.zeros(self.size * 2, dtype=dtype)
self.i = 0
def extend(self, data):
if len(data.shape) > 1:
raise ValueError("data must be a flat array")
l = data.size
if l > self.size:
raise ValueError("data cannot be larger than size")
start = (self.i % self.size)
end = start + l
start_2 = start + self.size
end_2 = end + self.size
self.i += l
if end < self.buf.size:
self.buf[start:end] = data
if end_2 < self.buf.size:
self.buf[start_2:end_2] = data
def read(self):
start = (self.i % self.size)
end = start + self.size
return self.buf[start:end]
def float_to_16_bit_pcm(raw_floats):
floats = array.array('f', raw_floats)
samples = [sample * 32767 for sample in floats]
raw_ints = struct.pack("<%dh" % len(samples), *samples)
return raw_ints
def callback(indata, frames, time, status): #outdata is 5th - when no inputstream
global run_time
global i
if status:
print(status)
b.extend(indata.squeeze())
elapsed_time = time2.time()- start
if elapsed_time > duration and i % 50 == 0:
aud = b.read()
pcm16 = float_to_16_bit_pcm(aud)
fp = acoustid.fingerprint(16000, 1, pcm16)
response = acoustid.lookup(API_KEY, fp, 15)
print response
i += 1
with sd.InputStream(samplerate=16000, dtype= np.float32, channels=1, callback=callback):
sd.sleep(int(run_time*1000))
响应回来时出现错误: {u'status':u'error',u'error':{u'message':u'invalid fingerprint',u'code':3}}
我知道Chromaprint的创建者确实提到现在可以将连续音频流馈送到Chromaprint 1.4(用C ++)并在此获取指纹: https://oxygene.sk/2016/12/chromaprint-1-4-released/
有没有人对此提供任何经验或提供建议?
由于