.NET System :: String到UTF8字节存储在char *中

时间:2011-07-06 12:19:12

标签: .net c++ string char unmanaged

我在.NET项目中包含一些非托管C ++代码。为此,我需要将System::String转换为char*中存储的UTF8字节。

我不确定这是否是最佳或甚至是正确的方法,如果有人可以查看并提供反馈,我将不胜感激。

谢谢,

/大卫

// Copy into blank VisualStudio C++/CLR command line solution.
#include "stdafx.h"
#include <stdio.h>

using namespace System;
using namespace System::Text;
using namespace System::Runtime::InteropServices;

// Test for calling with char* argument.
void MyTest(const char* buffer)
{
    printf_s("%s\n", buffer);
    return;
}

int main()
{

   // Create a UTF-8 encoding.
   UTF8Encoding^ utf8 = gcnew UTF8Encoding;

   // A Unicode string with two characters outside an 8-bit code range.
   String^ unicodeString = L"This unicode string contains two characters with codes outside an 8-bit code range, Pi (\u03a0) and Sigma (\u03a3).";
   Console::WriteLine(unicodeString);

   // Encode the string.
   array<Byte>^encodedBytes = utf8->GetBytes(unicodeString);

   // Get pointer to unmanaged char array
   int size = Marshal::SizeOf(encodedBytes[0]) * encodedBytes->Length;
   IntPtr pnt = Marshal::AllocHGlobal(size);
   Marshal::Copy(encodedBytes, 0, pnt, encodedBytes->Length);

   // Ugly, but necessary?
   char *charPnt= (char *)pnt.ToPointer();
   MyTest(charPnt);
   Marshal::FreeHGlobal(pnt);

}

1 个答案:

答案 0 :(得分:12)

  1. 您无需创建编码器实例,可以使用静态实例。

  2. 如果被调用的函数不期望指向HGlobal堆的指针,则可以使用普通的C / C ++内存分配(new或malloc)作为缓冲区。

  3. 在您的示例中,该功能不占用所有权,因此您根本不需要副本,只需固定缓冲区即可。

  4. 类似的东西:

    // Encode the text as UTF8
    array<Byte>^ encodedBytes = Encoding::UTF8->GetBytes(unicodeString);
    
    // prevent GC moving the bytes around while this variable is on the stack
    pin_ptr<Byte> pinnedBytes = &encodedBytes[0];
    
    // Call the function, typecast from byte* -> char* is required
    MyTest(reinterpret_cast<char*>(pinnedBytes), encodedBytes->Length);
    

    或者如果你需要像大多数C函数一样零终止的字符串(包括OP中的例子)那么你应该添加一个零字节。

    // Encode the text as UTF8, making sure the array is zero terminated
    array<Byte>^ encodedBytes = Encoding::UTF8->GetBytes(unicodeString + "\0");
    
    // prevent GC moving the bytes around while this variable is on the stack
    pin_ptr<Byte> pinnedBytes = &encodedBytes[0];
    
    // Call the function, typecast from byte* -> char* is required
    MyTest(reinterpret_cast<char*>(pinnedBytes));