Question

我正在使用python开发自己的mp3解码器，但我对ID3标签的解码有点困难。我不想使用像mutagen或eyeD3这样的现有库，但遵循ID3v2规范。

问题是帧数据是以某种格式编码我无法打印的，使用调试器我看到了值＆＃34; Hideaway＆＃34;但是你可以在这里看到一些奇怪的角色：

'data': '\\x00Hideaway'

我有以下问题：那是什么类型的编码？如何解码和打印该字符串？你认为其他mp3文件在ID3标签中使用不同的编码吗？

顺便说一句，我在文件顶部使用了utf-8声明

# -*- coding: utf-8 -*-

我正在使用python中的常规I / O方法读取文件（ read（））

Answer 1

字符\\x00表示值为零的单个字节位于H之前。所以，你的字符串看起来像这样：

Zero - H - i - d - e ...

通常字符串中包含字母或数字，而不是零。也许这种用法特定于ID3v2？

考虑到IDC3v2标准（http://id3.org/id3v2.4.0-structure），我们发现它是：

Frames that allow different types of text encoding contains a text
encoding description byte. Possible encodings:

 $00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
 $01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
       strings in the same frame SHALL have the same byteorder.
       Terminated with $00 00.
 $02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
       Terminated with $00 00.
 $03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

因此，我们看到零字节表示ISO-8859-1编码，直到下一个零字节。

你的程序可能会这样处理：

title = fp.read(number_of_bytes)
if(title[0] == '\x00')
    title = title[1:].decode('iso8859-1')
elif(title[0] == ... something else ...)
    title = title[1:].decode('some-other-encoding')
...

打印编码字符串

1 个答案: