Question

我有一个字节数组，希望找到特定字节的第一次出现（如果有的话）。

你们能用一种优雅而有效的方式来帮助我吗？

 /// Summary
/// Finds the first occurance of a specific byte in a byte array.
/// If not found, returns -1.
public int GetFirstOccurance(byte byteToFind, byte[] byteArray)
{

}

Answer 1

public static int GetFirstOccurance(byte byteToFind, byte[] byteArray)
{
   return Array.IndexOf(byteArray,byteToFind);
}

如果没有找到它将返回-1

或者Sam指出，一种扩展方法：

public static int GetFirstOccurance(this byte[] byteArray, byte byteToFind)
{
   return Array.IndexOf(byteArray,byteToFind);
}

或者说它是通用的：

public static int GetFirstOccurance<T>(this T[] array, T element)
{
   return Array.IndexOf(array,element);
}

然后你可以说：

int firstIndex = byteArray.GetFirstOccurance(byteValue);

Answer 2

Array.IndexOf？

Answer 3

由于您提到了效率，这里有一些经过大量优化的C＃代码我已经编写过使用本机寻址和最大qword对齐读取来将内存访问次数减少8倍如果有更快的方法在 .NET 中扫描内存中的字节，我会感到惊讶。

这将返回从offset i开始的内存范围内第一次出现的字节＆＃39; v＆＃39; 的索引（相对于地址src ），并继续c长度。如果找不到字节v，则返回 -1 。

// fast IndexOf byte in memory. (To use this with managed byte[] array, see below) public unsafe static int IndexOfByte(byte* src, byte v, int i, int c) { ulong t; byte* p, pEnd; for (p = src + i; ((long)p & 7) != 0; c--, p++) if (c == 0) return -1; else if (*p == v) return (int)(p - src); ulong r = v; r |= r << 8; r |= r << 16; r |= r << 32; for (pEnd = p + (c & ~7); p < pEnd; p += 8) { t = *(ulong*)p ^ r; t = (t - 0x0101010101010101) & ~t & 0x8080808080808080; if (t != 0) { t &= (ulong)-(long)t; return (int)(p - src) + dbj8[t * 0x07EDD5E59A4E28C2 >> 58]; } } for (pEnd += c & 7; p < pEnd; p++) if (*p == v) return (int)(p - src); return -1; }

不要对你看到的一次乘法感到震惊;它每次调用此函数时最多只执行一次，以便进行最终deBruijn lookup。用于它的只读查找表是一个64字节值的简单共享列表，需要一次性初始化：

// elsewhere in the static class... readonly static sbyte[] dbj8 = { 7, -1, -1, -1, -1, 5, -1, -1, -1, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 6, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 3, -1, -1, -1, -1, -1, -1, 1, -1, 2, 0, -1, -1, };

永远不会访问-1值，如果需要，可以将其保留为零，如下所示，如下所示，如果您愿意：

static MyStaticClass() { dbj8 = new sbyte[64]; // initialize the lookup table (alternative to the above) dbj8[0x00] = 7; dbj8[0x18] = 6; dbj8[0x05] = 5; dbj8[0x09] = 4; dbj8[0x33] = 3; dbj8[0x3C] = 2; dbj8[0x3A] = 1; /* dbj8[0x3D] = 0; */ } readonly static sbyte[] dbj8, dbj16;

为了完整起见，这里是如何在原始问题中使用OP提供的方法原型的函数。

/// Finds the first occurrence of a specific byte in a byte array. /// If not found, returns -1. public static unsafe int GetFirstOccurance(byte byteToFind, byte[] byteArray) { fixed (byte* p = byteArray) return IndexOfByte(p, byteToFind, 0, byteArray.Length); }

<强>讨论
我的代码有点复杂，所以详细的检查留给感兴趣的读者练习。您可以在 .NET 内部方法Buffer.IndexOfByte中研究另一种关于帮派内存搜索的一般方法，但与我的相比，该代码有明显的缺点：

最重要的是，.NET代码每次只扫描4个字节，而不是像我的那样扫描8个字节。

这是一种非公开的方法，因此您需要使用反射来调用它。

.NET代码存在＆＃34;性能泄漏＆＃34; t1 != 0检查提供误报的地方，以及后面的四项检查都被浪费了。注意他们的＆＃34;堕落＆＃34;案例：由于这种假阳性，他们需要进行四次最终检查 - 从而允许堕落 - 以保持正确性，而不是仅仅三次。

.NET代码的误报是由于从一个字节到下一个字节的进位位的溢出本质上较差的按位计算引起的。这会导致two's complement不对称（通过使用常量0x7efefeff或0x81010100来证明）和偶尔出现的左边出口＆＃34;关于最重要字节的信息（即丢失），这是真正的问题。相反，我使用下溢计算，这使得每个字节的计算独立于其邻居＆＃39;。我的方法在所有没有误报的情况下给出了结论性的结果，或者＃34;直播＆＃34;处理

我的代码使用branchless technique进行最终查找。一般认为，少数非分支逻辑运算（在这种情况下加一次乘法）有利于扩展if-else结构的性能，因为后者可能会破坏CPU predictive caching。对于我的8字节扫描程序来说，这个问题更为重要，因为在不使用查找的情况下，我在最终检查中的if-else条件是4字节组合扫描程序的两倍{/ li}。

当然，如果你不关心所有这些细节，你可以复制并使用代码;我对其进行了彻底的单元测试，并验证了所有格式良好的输入的正确行为。因此，虽然核心功能可以使用，但您可能希望添加参数检查。

[编辑：]

String.IndexOf(String s, Char char, int ix_start, int count) ... fast!

由于上述方法在我的项目中已经成功运行，因此我将其扩展为涵盖16位搜索。以下是适用于搜索 16位短，ushort或char 原语而不是byte的相同代码。这种改编的方法也根据其各自的单元测试方法进行了独立验证。

static MyStaticClass() { dbj16 = new sbyte[64]; /* dbj16[0x3A] = 0; */ dbj16[0x33] = 1; dbj16[0x05] = 2; dbj16[0x00] = 3; } readonly static sbyte[] dbj16; public static int IndexOf(ushort* src, ushort v, int i, int c) { ulong t; ushort* p, pEnd; for (p = src + i; ((long)p & 7) != 0; c--, p++) if (c == 0) return -1; else if (*p == v) return (int)(p - src); ulong r = ((ulong)v << 16) | v; r |= r << 32; for (pEnd = p + (c & ~7); p < pEnd; p += 4) { t = *(ulong*)p ^ r; t = (t - 0x0001000100010001) & ~t & 0x8000800080008000; if (t != 0) { i = dbj16[(t & (ulong)-(long)t) * 0x07EDD5E59A4E28C2 >> 58]; return (int)(p - src) + i; } } for (pEnd += c & 7; p < pEnd; p++) if (*p == v) return (int)(p - src); return -1; }

以下是用于访问其余16位基元的各种重载，加上String（显示最后一个）：

public static int IndexOf(this char[] rg, char v) => IndexOf(rg, v, 0, rg.Length); public static int IndexOf(this char[] rg, char v, int i, int c = -1) { if (rg != null && (c = c < 0 ? rg.Length - i : c) > 0) fixed (char* p = rg) return IndexOf((ushort*)p, v, i, c < 0 ? rg.Length - i : c); return -1; } public static int IndexOf(this short[] rg, short v) => IndexOf(rg, v, 0, rg.Length); public static int IndexOf(this short[] rg, short v, int i, int c = -1) { if (rg != null && (c = c < 0 ? rg.Length - i : c) > 0) fixed (short* p = rg) return IndexOf((ushort*)p, (ushort)v, i, c < 0 ? rg.Length - i : c); return -1; } public static int IndexOf(this ushort[] rg, ushort v) => IndexOf(rg, v, 0, rg.Length); public static int IndexOf(this ushort[] rg, ushort v, int i, int c = -1) { if (rg != null && (c = c < 0 ? rg.Length - i : c) > 0) fixed (ushort* p = rg) return IndexOf(p, v, i, c < 0 ? rg.Length - i : c); return -1; } public static int IndexOf(String s, Char ch, int i = 0, int c = -1) { if (s != null && (c = c < 0 ? s.Length - i : c) > 0) fixed (char* p = s) return IndexOf((ushort*)p, ch, i, c); return -1; }

请注意，String重载未标记为扩展方法，因为此函数的更高性能替换版本永远不会被调用（具有相同名称的内置方法始终优先于扩展方法））。要将其用作String实例的扩展名，您可以更改此方法的名称。例如，IndexOf__(this String s,...)会使其在 Intellisense 列表中的内置方法名称旁边显示，这可能是选择加入的有用提醒。否则，如果您不需要扩展语法，那么当您想要使用它而不是s.IndexOf(Char ch)时，您可以确保直接将此优化版本称为其自己类的静态方法。

在字节[]数组中查找第一个特定字节c＃

3 个答案: