在比较字节时,提取的音频样本是否应包含在其原始源中?

时间:2014-12-29 20:15:16

标签: java audio javasound

假设我有一个带有句子的音频wav文件:

+-----------+----------------------------------------+
| meta data | 'Audio recognition sometimes is trick' |.wav
+-----------+----------------------------------------+

现在考虑在Audacity中打开此音频,并根据其绘制波形提取并保存另一个文件中的“有时”一词。

+-----------+-------------+
| meta data | 'sometimes' |.wav
+-----------+-------------+

然后我使用这个Java代码只从两个文件中获取音频数据:

    //...
    Path source = Paths.get("source.wav");
    Path sample = Paths.get("sometimes.wav");
    int index = compare(transform(source), transform(sample));
    System.out.println("Shouldn't I be greater than -1!? " + (index > -1));
    //...

    private int compare(int[] source, int[] sample) throws IOException {
        return Collections.indexOfSubList(Arrays.asList(source), Arrays.asList(sample));
    }

    private int[] transform(Path audio) throws IOException, UnsupportedAudioFileException {
    try (AudioInputStream ais = AudioSystem.getAudioInputStream(
            new ByteArrayInputStream(Files.readAllBytes(audio)))) {

        AudioFormat format = ais.getFormat();
        byte[] audioBytes = new byte[(int) (ais.getFrameLength() * format.getFrameSize())];
        int nlengthInSamples = audioBytes.length / 2;
        int[] audioData = new int[nlengthInSamples];
        for (int i = 0; i < nlengthInSamples; i++) {
            int LSB = audioBytes[2*i]; /* First byte is LSB (low order) */
            int MSB = audioBytes[2*i+1]; /* Second byte is MSB (high order) */
            audioData[i] = (MSB << 8) | (255 & LSB);
        }
        return audioData;
    }
}

现在又来了我的问题。

考虑到之前提到的提取,此代码是否应该能够在原始音频文件中找到“有时”的音频数据字节?

我尝试将内容比作字符串,但根本没有幸运:

new String(source).contains(new String(sample));

有人能指出我在这里失踪的东西吗?

1 个答案:

答案 0 :(得分:0)

@Phil,你是那个人!你的提示引导我找到解决方案!

  1. Audacity示例音频提取以某种不同的方式对样本字节进行编码;

  2. 我写了一个Java程序来识别源音频中的静音,然后我分解了一些 逐字逐句地采样;

  3. 比较匹配的源和新的非大胆样本!

  4. 以下是新变换和比较:

    private int compare(byte[] captchaData, byte[] sampleData) throws IOException {
        return new String(captchaData).indexOf(new String(sampleData));
    }
    
    private byte[] transform(Path audio) throws IOException, UnsupportedAudioFileException {
        AudioInputStream ais = AudioSystem.getAudioInputStream(audio.toFile());
        AudioFormat format = ais.getFormat();
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
            int nBufferSize = 1024 * format.getFrameSize();
            byte[] abBuffer = new byte[nBufferSize];
            int nBytesRead;
            while ((nBytesRead = ais.read(abBuffer)) > -1) {
                baos.write(abBuffer, 0, nBytesRead);
            }
            return baos.toByteArray();
        }
    }
    

    分离器:

    private List<byte[]> split(byte[] audioData) {
        System.out.println(audioData.length);
        List<byte[]> byteList = new ArrayList<>();
        int zeroCounter = 0;
        int lastPos = 0;
        for (int i = 0; i < audioData.length; i++) {
            if (audioData[i] >= -1 && audioData[i] <= 1) {
                zeroCounter++; //too many leading 'zeros' could indicate silence or very low noise...
            } else if (zeroCounter > 0) {
                if (zeroCounter > 2000) {
                    int from = lastPos;
                    int to = i - (zeroCounter/2);
                    byteList.add(
                        Arrays.copyOfRange(
                            audioData,
                            from,
                            to));
                    System.out.println("split from: " + from + " to: " + to);
                    lastPos = to;
                }
                zeroCounter = 0;
            }
        }
        return byteList;
    }
    

    谢谢!