Unicode字符串可以包含代理项对(尤其是表情符号)。现在我需要将此字符串截断为n个字符。如何在不破坏任何表情符号的情况下安全地完成任务?
答案 0 :(得分:0)
以下代码应该可以解决您的问题:
FUNCTION IsDiacritical(C : CHAR) : BOOLEAN;
VAR
W : WORD ABSOLUTE C;
BEGIN
Result:=((W>=$1AB0) AND (W<=$1AFF)) OR
((W>=$0300) AND (W<=$036F)) OR
((W>=$1DC0) AND (W<=$1DFF))
END;
FUNCTION GetNextChar(VAR S : STRING) : STRING;
VAR
C : CHAR;
P : Cardinal;
BEGIN
CASE S.Length OF
0 : Result:='';
1 : Result:=S
ELSE // OTHERWISE //
Result:=''; P:=1;
FOR C IN S DO
IF NOT IsDiacritical(C) THEN
BREAK
ELSE BEGIN
Result:=Result+C;
INC(P)
END;
IF (P<LENGTH(S)) AND IsSurrogatePair(S,P) THEN
Result:=Result+COPY(S,P,2)
ELSE
Result:=Result+COPY(S,P,1)
END;
DELETE(S,1,Result.Length)
END;
FUNCTION GetStringByCodePoints(S : STRING ; CodePoints : Cardinal) : STRING;
VAR
I : Cardinal;
BEGIN
Result:='';
FOR I:=1 TO CodePoints DO Result:=Result+GetNextChar(S)
END;
PROCEDURE SetLengthByCodePoints(VAR S : STRING ; CodePoints : Cardinal);
BEGIN
SetLength(S,GetStringByCodePoints(S,CodePoints).Length)
END;
GetStringByCodePoints
类似于COPY,SetLengthByCodePoints
类似于SetLength。但是,两者都取代码点数(“可见字符”或控制字符)而不是字符。
如果有更多的组合变音符号码,则可以扩展相关功能以包含这些。我检查的三个组是我通过简单的谷歌搜索找到的组。