Question

我正在尝试使用PC麦克风进行拍子检测，然后使用拍子的时间戳计算多个连续拍子之间的距离。我选择python是因为有很多可用的材料并且开发起来很快。通过搜索互联网，我想到了这个简单的代码（尚无高级峰值检测或任何其他工具，如果需要的话，可以稍后提供）：

import pyaudio
import struct
import math
import time


SHORT_NORMALIZE = (1.0/32768.0)


def get_rms(block):
    # RMS amplitude is defined as the square root of the
    # mean over time of the square of the amplitude.
    # so we need to convert this string of bytes into
    # a string of 16-bit samples...

    # we will get one short out for each
    # two chars in the string.
    count = len(block)/2
    format = "%dh" % (count)
    shorts = struct.unpack(format, block)

    # iterate over the block.
    sum_squares = 0.0
    for sample in shorts:
        # sample is a signed short in +/- 32768.
        # normalize it to 1.0
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n

    return math.sqrt(sum_squares / count)


CHUNK = 32
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

elapsed_time = 0
prev_detect_time = 0

while True:
    data = stream.read(CHUNK)
    amplitude = get_rms(data)
    if amplitude > 0.05:  # value set by observing graphed data captured from mic
        elapsed_time = time.perf_counter() - prev_detect_time
        if elapsed_time > 0.1:  # guard against multiple spikes at beat point
            print(elapsed_time)
            prev_detect_time = time.perf_counter()

def close_stream():
  stream.stop_stream()
  stream.close()
  p.terminate()

该代码在静默状态下运行良好，我对运行它的前两个时刻感到非常满意，但是后来我尝试了它的准确性，但对它的满意度却有所降低。为了测试这一点，我使用了两种方法：将节拍器设置为60bpm的电话（向麦克风发出tic toc声音）和挂接到蜂鸣器的Arduino，这由精确的Chronodot RTC以1Hz的速率触发。蜂鸣器向麦克风发出蜂鸣声，触发检测。两种方法的结果看起来都差不多（数字代表两次节拍检测之间的距离，以秒为单位）：

0.9956681643835616
1.0056331689497717
0.9956100091324198
1.0058207853881278
0.9953449497716891
1.0052103013698623
1.0049350136986295
0.9859074337899543
1.004996383561644
0.9954095342465745
1.0061518904109583
0.9953025753424658
1.0051235068493156
1.0057199634703196
0.984839305936072
1.00610396347032
0.9951862648401821
1.0053146301369864
0.9960100821917806
1.0053391780821919
0.9947373881278523
1.0058608219178105
1.0056580091324214
0.9852110319634697
1.0054473059360731
0.9950465753424638
1.0058237077625556
0.995704694063928
1.0054566575342463
0.9851026118721435
1.0059882374429243
1.0052523835616398
0.9956161461187207
1.0050863926940607
0.9955758173515932
1.0058052968036577
0.9953960913242028
1.0048014611872205
1.006336876712325
0.9847434520547935
1.0059712876712297

现在，我非常有信心至少Arduino的精度为1毫秒（这是目标精度）。结果趋向于偏离+ -5毫秒，但有时甚至15毫秒，这是不可接受的。有没有一种方法可以达到更高的精度，或者是python /声卡/其他限制？谢谢！

编辑：将tom10和barny的建议合并到代码中后，代码如下：

import pyaudio
import struct
import math
import psutil
import os


def set_high_priority():
    p = psutil.Process(os.getpid())
    p.nice(psutil.HIGH_PRIORITY_CLASS)


SHORT_NORMALIZE = (1.0/32768.0)


def get_rms(block):
    # RMS amplitude is defined as the square root of the
    # mean over time of the square of the amplitude.
    # so we need to convert this string of bytes into
    # a string of 16-bit samples...

    # we will get one short out for each
    # two chars in the string.
    count = len(block)/2
    format = "%dh" % (count)
    shorts = struct.unpack(format, block)

    # iterate over the block.
    sum_squares = 0.0
    for sample in shorts:
        # sample is a signed short in +/- 32768.
        # normalize it to 1.0
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n

    return math.sqrt(sum_squares / count)


CHUNK = 4096
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RUNTIME_SECONDS = 10

set_high_priority()

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

elapsed_time = 0
prev_detect_time = 0
TIME_PER_CHUNK = 1000 / RATE * CHUNK
SAMPLE_GROUP_SIZE = 32  # 1 sample = 2 bytes, group is closest to 1 msec elapsing
TIME_PER_GROUP = 1000 / RATE * SAMPLE_GROUP_SIZE

for i in range(0, int(RATE / CHUNK * RUNTIME_SECONDS)):
    data = stream.read(CHUNK)
    time_in_chunk = 0
    group_index = 0
    for j in range(0, len(data), (SAMPLE_GROUP_SIZE * 2)):
        group = data[j:(j + (SAMPLE_GROUP_SIZE * 2))]
        amplitude = get_rms(group)
        amplitudes.append(amplitude)
        if amplitude > 0.02:
            current_time = (elapsed_time + time_in_chunk)
            time_since_last_beat = current_time - prev_detect_time
            if time_since_last_beat > 500:
                print(time_since_last_beat)
                prev_detect_time = current_time
        time_in_chunk = (group_index+1) * TIME_PER_GROUP
        group_index += 1
    elapsed_time = (i+1) * TIME_PER_CHUNK

stream.stop_stream()
stream.close()
p.terminate()

使用此代码，我获得了以下结果（单位是时间毫秒而不是秒）：

999.909297052154
999.9092970521542
999.9092970521542
999.9092970521542
999.9092970521542
1000.6349206349205
999.9092970521551
999.9092970521524
999.9092970521542
999.909297052156
999.9092970521542
999.9092970521542
999.9092970521524
999.9092970521542

如果我没有犯任何错误，它看上去会比以前更好很多，并且达到了亚毫秒级的精度。感谢tom10和barny的帮助。

Answer 1

之所以没有为节拍确定正确的时间，是因为您缺少音频数据块。也就是说，声卡正在读取这些块，但是在用下一个块覆盖数据之前，您并没有收集数据。

不过，首先，对于这个问题，您需要区分定时精度和实时响应。

声卡的定时精度应该非常好，比ms更好，并且您应该能够从声卡读取的数据中捕获所有这些精度。您计算机操作系统的实时响应性应该非常差，比毫秒差得多。 也就是说，您应该能够轻松地在毫秒内识别音频事件（例如节拍），而在事件发生时（而不是在30-200ms之后，具体取决于您的系统）就无法识别它们。这种安排通常适用于计算机，因为一般人对事件发生时间的感知远大于毫秒（除了罕见的专门感知系统，例如比较两只耳朵之间的听觉事件等）。

您的代码的特定问题是CHUNKS太小，操作系统无法在每个样本中查询声卡。您的频率为32，所以在44100Hz时，操作系统需要每0.7毫秒访问一次声卡，对于负责执行许多其他任务的计算机来说，这太短了。如果您的操作系统在下一个进入之前没有获得该块，则原始块将被覆盖并丢失。

要使其正常工作，使其与上述约束一致，请使CHUNKS比32大得多，并使其更像1024（如PyAudio示例中所示）。取决于您的计算机及其运行状况，即使我不够长。

如果这种方法对您不起作用，则可能需要像Arduino这样的专用实时系统。（尽管通常这不是必需的，所以在决定需要使用Arduino之前要三思。通常，当我看到人们需要真正的实时性时，就是在尝试与人类进行非常量化的交互，例如，使闪光灯闪烁，让人们点击一个按钮，使另一个闪光灯闪烁，让人们点击另一个按钮，等等，以测量响应时间。）

如何在Python中使用麦克风获取准确的时间

1 个答案: