我需要将超过160个字符的短信分成多个部分,这样我就可以发送大量的短信。
有些SMS API会为你分割(它们支持多部分消息),但我正在和几家公司合作,所以我不得不自己拆分消息。
分割信息很简单。我的问题是当SMS消息包含“转义”和“用完”2个字符的字符时该怎么办?
对于那些不知道我在说什么的人:
即使在7位编码中,一些字符也被“转义”,这意味着它们“耗尽”了2个字符。在默认的7位编码中,它们是:
{}[]\|^~€
。
来源:https://stackoverflow.com/a/7061794/158126
例如,此字符串 35个字符:
这笔款项的金额为100欧元。
但是,当通过短信提供商发送时,实际上 36个字符,因为欧元符号被“转义”并占用了两个字符。
关于拆分SMS消息有很多问题,但没有一个问题考虑到这些“转义”字符可能会导致问题。
所以我创造了一个打击这个的功能。我已经对此进行了测试,并且它有效,所以希望它可以帮助其他人。
回到我的问题,我觉得我的代码非常低效。我在循环中运行preg_match
几次,我不确定是否有更好的解决方案。
有没有人对如何提高此代码效率有任何建议?
function sms_message_parts($message) {
// Message parts
$parts = array();
// The default encoding is utf16 (unicode) until proven otherwise
$encoding = 'utf16';
// Characters that are allowed in 7bit messages
$gsm_7bit_chars = '@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !"#¤%&\'\(\)\*+,-\.\/0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà';
// Characters that are allowed in 7bit_ex messages
$gsm_7bit_ex_chars = '\^{}\\\\\[~\]|€';
// Message lengths
$message_lengths = array(
'7bit' => 160,
'7bit_ex' => 160,
'utf16' => 70
);
// Detect encoding of message
if (preg_match("/^[" . $gsm_7bit_chars . "]*$/u", $message) == 1)
$encoding = '7bit';
elseif (preg_match("/^[" . $gsm_7bit_chars . $gsm_7bit_ex_chars . "]*$/u", $message) == 1)
$encoding = '7bit_ex';
// Determine how long each part of the message can be
$max_parts_length = $message_lengths[$encoding];
// Length of the message
$message_length = mb_strlen($message, 'UTF-8');
// 7bit_ex message
// Escaped characters found so we need to find the REAL length
// and split the message differently
if ($encoding == '7bit_ex') {
// Count how many extra characters are required a result of
// the 7bit_ex characters
$extra_chars = 0;
for($i=0;$i<$message_length;$i++) {
if (preg_match("/^[" . $gsm_7bit_ex_chars . "]*$/u", mb_substr($message, $i, 1, 'UTF-8')) == 1)
$extra_chars++;
}
// New message length
$new_message_length = $message_length + $extra_chars;
// Is this going to be a multipart message?
if ($new_message_length > $max_parts_length) {
// Split the message
$start = 0;
while(true) {
// Determine the length of the split (if it's the last part, we don't need to look for
// extra "escaped" characters)
$last_part = false;
$chars_left = $message_length - $start;
if ($chars_left < $max_parts_length) {
$split_length = $chars_left;
$last_part = true;
} else {
$split_length = $max_parts_length;
}
// Extract the message part
$part = mb_substr($message, $start, $split_length, 'UTF-8');
// Check to see if this part has any escaped characters
$part_extra_chars = 0;
if (!$last_part) {
for($i=0;$i<$split_length;$i++) {
if (preg_match("/^[" . $gsm_7bit_ex_chars . "]*$/u", mb_substr($part, $i, 1, 'UTF-8')) == 1)
$part_extra_chars++;
}
}
// If it has escaped characters, deduct from the amount of characters in this part
// before adding to the parts array
if ($part_extra_chars > 0) {
$part = mb_substr($message, $start, ($split_length - $part_extra_chars), 'UTF-8');
$parts[] = trim($part);
$start = $start + ($split_length - $part_extra_chars);
// No escaped characters, add part to parts array
} else {
$parts[] = trim($part) . ' ' .$split_length;
$start = $start + $max_parts_length;
}
// We've reached the end of the message
if ($start >= $message_length)
break;
}
// It's a signle message
} else {
$parts[] = $message;
}
// 7bit and utf16 (unicode) messages don't have escaped characters
} else {
// Is this going to be a multipart message? Split this part before adding to the
// parts array
if ($message_length > $max_parts_length) {
// Split the message into parts
$total_messages = ceil($message_length / $max_parts_length);
$start = 0;
for($i=0;$i<$total_messages;$i++) {
$parts[] = trim(mb_substr($message, $start, $max_parts_length, 'UTF-8'));
$start = $start + $max_parts_length;
}
// It's a signle message
} else {
$parts[] = $message;
}
}
return array('parts' => $parts, 'encoding' => $encoding);
}
答案 0 :(得分:0)
如果您只想跳过特定的charcater,那么可以使用正则表达式。
类似于:(?#comment)
正则表达式引擎会忽略(?#和)之间的所有内容。
或者您可以在replace_all
使用正则表达式,您可以在其中替换所需的字符而不使用字符''