我的电报bot api有问题。我正在尝试从邮件中提取URL。它以MessageEntity类型编写,偏移量和长度以UTF-16代码单位指定。我已经尝试了很多方法从文本中获取子字符串(使用mb_convert_encoding,iconv,json_encode等),但是我没有得到正确的链接。它适用于没有表情符号的纯文本,但不适用于它们。
答案 0 :(得分:0)
$output = json_decode(file_get_contents('php://input'), TRUE);
$message = $output['message']['text'];
$entities = $output['message']['entities'];
function getURLs($message, $entities) {
$URLs = [];
//$message_encode = iconv('utf-8', 'utf-16le', $message); //or utf-16
$message_encode = mb_convert_encoding($message, "UTF-16", "UTF-8"); //or utf-16le
foreach ($entities as $entitie) {
if ($entitie['url']) {
$URLs[] = $entitie['url'];
}
if ($entitie['type']=='url') {
$URL16 = substr($message_encode, $entitie['offset']*2, $entitie['length']*2);
//$URLs[] = iconv('utf-16le', 'utf-8', $URL16);
$URLs[] = mb_convert_encoding($URL16, "UTF-8", "UTF-16");
}
}
return $URLs;
}
$URLs = getURLs($message, $entities);
您可以使用iconv或mb_convert_encoding,UTF-16le或UTF-16。 另请参阅PHP - length of string containing emojis/special chars