Question

我有一个如下所示的输入字符串：

4BFC434845000000

输入字符串中的每两个字符代表ISO-8859-1中的十六进制代码。

示例中的前两个字符（4B）代表数字4B ₁₆，代表ISO-8859-1中的 K 。
后两个字符（FC）代表FC ₁₆的数字，代表德语 u Umlaut （ü ）在ISO-8859-1。

上面的示例字符串表示Küche，这是厨房的德语单词。

输入字符串保证长度为16个字符，因此结果字符串的长度始终为8个字符。未使用的字符（如示例中所示）将为00。

我知道我可以使用PHP中的iconv或其他函数将ISO-8859-1字符串转换为另一种字符编码。但是我不知道如何将ISO-8859-1 charcode（例如FC ₁₆或252 ₁₀）转换为UTF-8字符。

当然，我可以使用关联数组将所有字符代码映射到它们代表的字符：

$table = array(
  0x4B => 'K',
  0xFC => 'ü',
  // ...
);

实现同样目标的最佳方法是什么？是否有PHP功能可以做到这一点？

Answer 1

这相当简单：将十六进制字符串转换为二进制，将ISO-8859二进制文件转换为UTF-8二进制文件：

if (byteUnderConsideration & Math.pow(2, (7 - bitIndexWithinByte))) return node.right

可选择在某些时候删除KBucket.prototype._determineNode = function (node, id, bitIndex) { // **NOTE** remember that id is a Buffer and has granularity of // bytes (8 bits), whereas the bitIndex is the _bit_ index (not byte) // id's that are too short are put in low bucket (1 byte = 8 bits) // parseInt(bitIndex / 8) finds how many bytes the bitIndex describes // bitIndex % 8 checks if we have extra bits beyond byte multiples // if number of bytes is <= no. of bytes described by bitIndex and there // are extra bits to consider, this means id has less bits than what // bitIndex describes, id therefore is too short, and will be put in low // bucket var bytesDescribedByBitIndex = ~~(bitIndex / 8) var bitIndexWithinByte = bitIndex % 8 if ((id.length <= bytesDescribedByBitIndex) && (bitIndexWithinByte !== 0)) return node.left var byteUnderConsideration = id[bytesDescribedByBitIndex] // byteUnderConsideration is an integer from 0 to 255 represented by 8 bits // where 255 is 11111111 and 0 is 00000000 // in order to find out whether the bit at bitIndexWithinByte is set // we construct Math.pow(2, (7 - bitIndexWithinByte)) which will consist // of all bits being 0, with only one bit set to 1 // for example, if bitIndexWithinByte is 3, we will construct 00010000 by // Math.pow(2, (7 - 3)) -> Math.pow(2, 4) -> 16 if (byteUnderConsideration & Math.pow(2, (7 - bitIndexWithinByte))) return node.right return node.left }个字节。

将ISO-8859-1 charcodes转换为UTF-8

1 个答案: