将AKAudioFile拆分成由静音分隔的块

时间:2018-07-19 21:39:47

标签: ios swift audiokit

假设一个AKAudioFile是由一个AKNodeRecorder创建的,其中包含一系列口语单词,每个单词之间至少间隔1秒,那么什么是最终创建一系列文件(每个文件包含一个单词)的最佳方法?

我相信,如果有一种方法可以以100 ms的块为单位对文件进行迭代,并测量每个块的平均幅度,则可以实现此目的。 “静音块”可能是低于任意小幅度的那些。进行迭代时,如果遇到一个具有非静默振幅的块,则可以获取此“非静默”块的开始时间戳,以创建一个从此处开始并在下一个“静默”块的开始时间结束的音频文件。

无论是使用上述手动方法还是AudioKit的内置处理技术,任何建议都将不胜感激。

1 个答案:

答案 0 :(得分:0)

我没有完整的解决方案,但是我已经开始从事与此类似的工作。此功能可以作为您所需的起点。基本上,您希望将文件读入缓冲区,然后分析缓冲区数据。那时,您可以将其切成较小的缓冲区,然后将其写入文件。

public class func guessBoundaries(url: URL, sensitivity: Double = 1) -> [Double]? {
    var out: [Double] = []

    guard let audioFile = try? AVAudioFile(forReading: url) else { return nil }
    let processingFormat = audioFile.processingFormat
    let frameCount = AVAudioFrameCount(audioFile.length)

    guard let pcmBuffer = AVAudioPCMBuffer(pcmFormat: processingFormat, frameCapacity: frameCount) else { return nil }
    audioFile.framePosition = 0

    do {
        audioFile.framePosition = 0
        try audioFile.read(into: pcmBuffer, frameCount: frameCount)

    } catch let err as NSError {
        AKLog("ERROR: Couldn't read data into buffer. \(err)")
        return nil
    }

    let channelCount = Int(pcmBuffer.format.channelCount)
    let bufferLength = 1024
    let inThreshold: Double = 0.001 / sensitivity
    let outThreshold: Double = 0.0001 * sensitivity
    let minSegmentDuration: Double = 1
    var counter = 0
    var thresholdCrossed = false
    var rmsBuffer = [Float](repeating: 0, count: bufferLength)
    var lastTime: Double = 0

    AKLog("inThreshold", inThreshold, "outThreshold", outThreshold)

    for i in 0 ..< Int(pcmBuffer.frameLength) {
        // n is the channel
        for n in 0 ..< channelCount {
            guard let sample: Float = pcmBuffer.floatChannelData?[n][i] else { continue }

            if counter == rmsBuffer.count {
                let time: Double = Double(i) / processingFormat.sampleRate

                let avg = rmsBuffer.reduce(0, +) / rmsBuffer.count
                // AKLog("Average Value at frame \(i):", avg)

                if avg > inThreshold && !thresholdCrossed && time - lastTime > minSegmentDuration {
                    thresholdCrossed = true
                    out.append(time)
                    lastTime = time
                } else if avg <= outThreshold && thresholdCrossed && time - lastTime > minSegmentDuration {
                    thresholdCrossed = false
                    out.append(time)
                    lastTime = time
                }
                counter = 0
            }
            rmsBuffer[counter] = abs(sample)
            counter += 1
        }
    }

    rmsBuffer.removeAll()
    return out
}