Question

我没有很长时间与C语言联系。我有一些与中文单词和strncpy有关的问题。

char* testString = "你好嗎?"
sizeof(testString) => it prints out 4.
strlen(testString) => it prints out 10.

当我想复制到另一个char数组时，我有一些问题。

char msgArray [7]; / *这只是一个例子。由于某些限制，我们限制了缓冲区大小。 * /

如果我想复制数据，我需要检查

if (sizeof(testString) < sizeof(msgArray)) {
    strncopy(msgArray, testString, sizeof(msgArray));
}

会有问题。结果是它只会复制部分数据。

实际上它应该与

进行比较

if (strlen(testString) < sizeof(msgArray)) {

}
else {
   printf("too long");
}

但我不明白为什么会这样。

如果我想定义限制字符数（包括unicode（例如中文字符），我如何实现定义数组？我想我不能使用char []数组。

非常感谢所有回复。

我的解决方案解决方案：我最终决定剪切字符串以满足有限的字节。

Answer 1

指针不是数组。 testString是一个指针，因此sizeof(testString)将指定指针的大小而不是指向的字符串。

strlen的工作方式不同，仅适用于null终止的char数组和字符串文字。它给出了null字符前面的字符串的长度。

Answer 2

char* testString = "你好嗎?"的行为取决于编译器。一种选择是通过%d输出单个字符来调查编译器正在做什么。它可能正在生成UTF-8字面值。

在C11标准中，您可以编写以下内容之一：

char const *testString = u8"你好嗎?";   // UTF-8 encoding

或

wchar_t const *testString = u"你好嗎?"; // UTF-16 or UCS-4 encoding

使用这些字符串，标准C中无法使用 Unicode字符。您只能使用代码点和/或C字符。 strlen或wcslen将分别给出字符串中的C字符数，但这可能与显示的字形数不对应。

如果您的编译器不符合最新标准（即它为上述行提供了错误），那么要编写可移植代码，您只需要在源文件中使用ASCII。

要在字符串文字中嵌入unicode，您可以将'\xNN'与UTF-8十六进制代码一起使用。

在这两种情况下，最好的办法是使用第三方Unicode库，例如ICU。

对于问题的第二部分，我假设您使用的是UTF-8。 strlen(testString) + 1的结果是您需要复制的字符数。你说你坚持使用固定大小的7字节缓冲区。如果这是真的那么代码可以是：

char buf[7];

if ( strlen(testString) > 6 )
    exit(1);   // or jump to some other error handling

strcpy(buf, testString);

应避免使用strncpy，因为在某些情况下它不会使其缓冲区为空;您始终可以使用strcpy或snprintf替换它。

Answer 3

通常你可以使用wchar_t来表示UTF字符（非英文字符），每个字符可能需要2或4个字节。如果你真的想快速计算字符数，请使用uint32_t（unsigned int）而不是char / wchar_t，因为UTF32保证每个字符（包括非英文字符）将具有相同的4字节大小。

sizeof（testString）只会给你一个指针本身的大小，在32位系统中是4，在64位系统中是8。

如果您正在使用wchar_t，请使用wcslen获取字符串len;如果您正在使用uint32_t，则需要编写自己的strlen函数，如下所示：

2015-11-10 14:31:36.905 +01:00 [Information] Start userinfo request
2015-11-10 14:31:36.905 +01:00 [Information] Token found: AuthorizationHeader
2015-11-10 14:31:36.905 +01:00 [Information] Start access token validation
2015-11-10 14:31:36.906 +01:00 [Information] "Token validation success"
"{
  \"ValidateLifetime\": true,
  \"AccessTokenType\": \"Jwt\",
  \"ExpectedScope\": \"openid\",
  \"Claims\": {
    \"client_id\": \"hybridclient\",
    \"scope\": [
      \"openid\",
      \"profile\",
      \"email\",
      \"roles\",
      \"offline_access\"
    ],
    \"sub\": \"1\",
    \"amr\": \"password\",
    \"auth_time\": \"1447153048\",
    \"idp\": \"idsrv\",
    \"iss\": \"https://rbmidde02.xxx.com/miIdentityServer\",
    \"aud\": \"https://rbmidde02.xxx.com/miIdentityServer/resources\",
    \"exp\": \"1447165896\",
    \"nbf\": \"1447162296\"
  }
}"
2015-11-10 14:31:36.907 +01:00 [Information] Creating userinfo response
2015-11-10 14:31:36.907 +01:00 [Information] Scopes in access token: "openid profile email roles offline_access"
2015-11-10 14:31:36.907 +01:00 [Information] Requested claim types: "sub name family_name given_name middle_name nickname preferred_username profile picture website gender birthdate zoneinfo locale updated_at email email_verified role"
2015-11-10 14:31:36.907 +01:00 [Information] Profile service returned to the following claim types: "sub given_name family_name email role role preferred_username"
2015-11-10 14:31:36.907 +01:00 [Information] End userinfo request
2015-11-10 14:31:36.907 +01:00 [Information] Returning userinfo response.

Answer 4

我不是专业人士，但你可以尝试这样的事情：

char* testString = "你好嗎?\0"; //null-terminating char at the end
int arr_len = 0;
while(testString[arr_len])
arr_len++;

结果，它返回10，这是数组字段的数量，所以如果你将它乘以单字节的大小，你将得到字符串的实际长度。

此致帕维尔

中文单词的C语言sizeof，strlen和strncpy

4 个答案: