Question

我知道字符串是不可变的，对字符串的任何更改只会在内存中创建一个新字符串（并将旧字符串标记为空闲字符串）。但是，我想知道下面的逻辑是否合理，你实际上可以以一种圆形的方式修改字符串的内容。

const string baseString = "The quick brown fox jumps over the lazy dog!";

//initialize a new string
string candidateString = new string('\0', baseString.Length);

//Pin the string
GCHandle gcHandle = GCHandle.Alloc(candidateString, GCHandleType.Pinned);

//Copy the contents of the base string to the candidate string
unsafe
{
    char* cCandidateString = (char*) gcHandle.AddrOfPinnedObject();
    for (int i = 0; i < baseString.Length; i++)
    {
        cCandidateString[i] = baseString[i];
    }
}

这种方法确实会改变内容candidateString（不在内存中创建新的candidateString），还是运行时通过我的技巧看待它并将其视为普通字符串？

Answer 1

由于有几个要素，您的示例工作得很好：

candidateString位于托管堆中，因此可以安全地进行修改。将其与实习的baseString进行比较。如果您尝试修改实习字符串，可能会发生意外情况。虽然它今天似乎有用，但不能保证字符串在某些时候不会存在于写保护的内存中。这与将常量字符串分配给C中的char*变量然后修改它非常相似。在C中，这是未定义的行为。
您在candidateString预先分配了足够的空间 - 因此您不会溢出缓冲区。

字符数据不存储在String类的偏移0处。它存储在等于RuntimeHelpers.OffsetToStringData的偏移量中。

public static int OffsetToStringData
{
    // This offset is baked in by string indexer intrinsic, so there is no harm
    // in getting it baked in here as well.
    [System.Runtime.Versioning.NonVersionable] 
    get {
        // Number of bytes from the address pointed to by a reference to
        // a String to the first 16-bit character in the String.  Skip 
        // over the MethodTable pointer, & String 
        // length.  Of course, the String reference points to the memory 
        // after the sync block, so don't count that.  
        // This property allows C#'s fixed statement to work on Strings.
        // On 64 bit platforms, this should be 12 (8+4) and on 32 bit 8 (4+4).
#if WIN32
        return 8;
#else
        return 12;
#endif // WIN32
    }
}

...除

GCHandle.AddrOfPinnedObject有两种类型的特殊：string和数组类型。它不是返回对象本身的地址，而是将偏移量返回给数据。请参阅CoreCLR中的source code。

// Get the address of a pinned object referenced by the supplied pinned
// handle.  This routine assumes the handle is pinned and does not check.
FCIMPL1(LPVOID, MarshalNative::GCHandleInternalAddrOfPinnedObject, OBJECTHANDLE handle)
{
    FCALL_CONTRACT;

    LPVOID p;
    OBJECTREF objRef = ObjectFromHandle(handle);

    if (objRef == NULL)
    {
        p = NULL;
    }
    else
    {
        // Get the interior pointer for the supported pinned types.
        if (objRef->GetMethodTable() == g_pStringClass)
            p = ((*(StringObject **)&objRef))->GetBuffer();
        else if (objRef->GetMethodTable()->IsArray())
            p = (*((ArrayBase**)&objRef))->GetDataPtr();
        else
            p = objRef->GetData();
    }

    return p;
}
FCIMPLEND

总之，运行时允许您使用其数据并且不会抱怨。毕竟，您正在使用unsafe代码。我发现运行时混乱比这更糟糕，包括在堆栈上创建引用类型; - ）

如果您的最终字符串短于分配的字符串，请记住在所有字符（偏移\0）后添加一个Length 。这不会溢出，每个字符串在末尾都有一个隐含的空字符，以简化互操作方案。

现在来看看StringBuilder如何创建字符串，这里是StringBuilder.ToString：

[System.Security.SecuritySafeCritical] // auto-generated public override String ToString() { Contract.Ensures(Contract.Result<String>() != null); VerifyClassInvariant(); if (Length == 0) return String.Empty; string ret = string.FastAllocateString(Length); StringBuilder chunk = this; unsafe { fixed (char* destinationPtr = ret) { do { if (chunk.m_ChunkLength > 0) { // Copy these into local variables so that they are stable even in the presence of race conditions char[] sourceArray = chunk.m_ChunkChars; int chunkOffset = chunk.m_ChunkOffset; int chunkLength = chunk.m_ChunkLength; // Check that we will not overrun our boundaries. if ((uint)(chunkLength + chunkOffset) <= ret.Length && (uint)chunkLength <= (uint)sourceArray.Length) { fixed (char* sourcePtr = sourceArray) string.wstrcpy(destinationPtr + chunkOffset, sourcePtr, chunkLength); } else { throw new ArgumentOutOfRangeException("chunkLength", Environment.GetResourceString("ArgumentOutOfRange_Index")); } } chunk = chunk.m_ChunkPrevious; } while (chunk != null); } } return ret; }

是的，它使用不安全的代码，是的，您可以使用fixed优化您的代码，因为这种类型的固定比分配GC句柄更轻量级
>
const string baseString = "The quick brown fox jumps over the lazy dog!"; //initialize a new string string candidateString = new string('\0', baseString.Length); //Copy the contents of the base string to the candidate string unsafe { fixed (char* cCandidateString = candidateString) { for (int i = 0; i < baseString.Length; i++) cCandidateString[i] = baseString[i]; } }

当您使用fixed时，GC仅发现在收集过程中遇到对象时需要固定的对象。如果没有收集，GC甚至不参与。使用GCHandle时，每次都会在GC中注册一个句柄。

Answer 2

正如其他人所指出的那样，在一些罕见的情况下，变异String对象是有用的。我举一个例子，下面是一个有用的代码片段。

使用例/背景

虽然每个人都应该是.NET一直提供的非常出色的字符编码支持的忠实粉丝，但有时可能最好减少开销，特别是如果在两者之间进行大量的往返8位（传统）字符和托管字符串（即典型的互操作方案）。

正如我所暗示的那样，.NET特别强调您必须明确指定文本Encoding，以便将非Unicode字符数据转换为托管String对象的任何/所有转换。这种严格控制在外围是非常值得称道的，因为它确保一旦你在托管运行时内部有字符串，你就不必担心; 一切只是广泛的Unicode。甚至UTF-8在这个原始境界中也被大量放逐。

（相比之下，回想一下其他一些流行的脚本语言，该语言使这个区域变得非常糟糕，最终导致多个年的并行2.x和3.x版本，这些都是由于后者中广泛的Unicode更改。）

所以.NET将所有混乱推送到互操作边界，一旦你进入内部就强制执行Unicode（UTF-16），但这种理念需要完成编码/解码工作（＆＃34;一次）并且所有人都非常严格，因此.NET编码/编码器类可能成为性能瓶颈。如果您将大量文本从宽（Unicode）移动到简单的固定7或8位窄ANSI，ASCII等（注意我不是在谈论MBCS或UTF-8，您在哪里＆＃ 39;我想使用编码器！），.NET编码范例似乎有点矫枉过正。

此外，可能是您不知道或不关心，指定Encoding。也许您关心的是对16位Char的低字节快速准确的往返。如果你look at the .NET source code，即使System.Text.ASCIIEncoding在某些情况下也可能过于庞大。

代码段

细字符串： 直接存储在托管中的8位字符 字符串，一个＆＃39;瘦字符＆＃39;每个宽Unicode字符，没有在往返过程中打扰字符编码/解码。

所有这些方法都忽略/去除每个16位Unicode字符的高位字节，仅按原样发送每个低字节。显然，只有当这些高位不相关时，才能成功恢复往返后的Unicode文本。

/// <summary> Convert byte array to "thin string" </summary> public static unsafe String ToThinString(this byte[] src) { int c; var ret = String.Empty; if ((c = src.Length) > 0) fixed (char* dst = (ret = new String('\0', c))) do dst[--c] = (char)src[c]; // fill new String by in-situ mutation while (c > 0); return ret; }

在刚才显示的方向上，通常会将中的原生数据带入托管，您通常不会拥有托管字节数组，因此不要分配只是为了调用此函数的临时函数，您可以将原始本机字节直接处理为托管字符串。和以前一样，这会绕过所有字符编码。

为了清楚起见，省略了这种不安全功能中需要的（明显的）范围检查：

public static unsafe String ToThinString(byte* pSrc, int c) { var ret = String.Empty; if (c > 0) fixed (char* dst = (ret = new String('\0', c))) do dst[--c] = (char)pSrc[c]; // fill new String by in-situ mutation while (c > 0); return ret; }

此处String变异的优点是您可以通过直接写入最终分配来避免临时分配。即使您使用stackalloc避免了额外的分配，当您最终调用String(Char*, int, int)构造函数时，也会对整个事情进行不必要的重新复制：显然，没有办法将您刚刚费力准备的数据与String在您完成之前不存在的对象相关联！

为了完整性......

这里是镜像代码，它反转操作以获取字节数组（即使这个方向没有用来说明字符串变异技术）。这是您通常用于发送托管.NET运行时的Unicode文本 out 的方向，供旧版应用使用。

/// <summary> Convert "thin string" to byte array </summary> public static unsafe byte[] ToByteArr(this String src) { int c; byte[] ret = null; if ((c = src.Length) > 0) fixed (byte* dst = (ret = new byte[c])) do dst[--c] = (byte)src[c]; while (c > 0); return ret ?? new byte[0]; }

你能通过不安全的方法改变（不可变）字符串的内容吗？

2 个答案: