Question

我正在寻找从字符串数组中搜索可以在字节数组中找到的所有巧合，并将这些巧合保存在文本文件中。

到目前为止，我已经加载了一个文件并将其数据转换为字节数组。我做了一个for循环，用我的字节数组的长度生成许多搜索。

byte[] test = System.IO.File.ReadAllBytes(openFileDialog1.FileName);

string hex = BitConverter.ToString(test).Replace("-", string.Empty);

for (int i = 0; i < hex.Length; i++) {
    //String array with some of the patterns I'm looking for in the byte array
    string[] patterns = { "05805A6C", "0580306C", "05801B6C" };

//I get the index if the pattern is found at i position
    int indice = hex.IndexOf("05805A6C", i);
//Do some calculations to get the offset I desire to register
    indice = indice + 8;
    int index = (indice / 2);
//Transform the index into hexadecimal
    string outputHex = int.Parse(index.ToString()).ToString("X");
//Output the index as an hexadecimal offset address
    MessageBox.Show("0x" + outputHex);
// i gets the value of the indice and the loop starts again at this position
    i = indice;
}

我的方法仅适用于查看一个模式。截至目前，我从文件“05805A6C”获取了所有偏移地址，但我的目标是从整个模式数组中进行完整搜索。

我怎样才能进行相同的搜索但是考虑字符串数组上的每个模式？

Answer 1

我没有针对一整套测试用例运行此操作，但是......

public static class ByteArrayExtensions
{

    public static int IndexOfAny(this byte[] source, byte[][] anyOf)
    {
        return IndexOfAny(source, anyOf, 0);
    }

    public static int IndexOfAny(this byte[] source, byte[][] anyOf, int startIndex)
    {
        var sanitisedAnyOf = new List<byte[]>(anyOf.Where(b => b != null && b.Length > 0 && b.Length <= source.Length));

        if ( startIndex < 0 ) startIndex = 0;

        for ( int i = startIndex ; i < source.Length ; ++ i )
        {
            var testByte = source[i];

            // Check all the anyOf arrays to see if they start a new possible match, and could fit in the remaining data
            for ( int anyOfIndex = 0 ; anyOfIndex < sanitisedAnyOf.Count ; ++ anyOfIndex )
            {
                if ( sanitisedAnyOf[anyOfIndex][0] == testByte && sanitisedAnyOf[anyOfIndex].Length + i <= source.Length )
                {
                    // This is a possible match here, scan forwards to see if it is a complete match
                    int checkScanIndex;
                    for ( checkScanIndex = 0 ; checkScanIndex < sanitisedAnyOf[anyOfIndex].Length ; ++ checkScanIndex )
                    {
                        if ( source[i + checkScanIndex] != sanitisedAnyOf[anyOfIndex][checkScanIndex] )
                        {
                            // It didn't match
                            break;
                        }
                    }

                    if ( checkScanIndex == sanitisedAnyOf[anyOfIndex].Length )
                    {
                        // This completely matched
                        return i;
                    }
                }
            }
        }

        return -1;
    }
}

测试代码：

void Test()
{
    var anyOf = new byte[][]
    {
        new byte[] { 0xF4, 0xF0 },
        new byte[] { 0x05, 0x80, 0x5A, 0x6C }, 
        new byte[] { 0x05, 0x80, 0x30, 0x6C }, 
        new byte[] { 0x05, 0x80, 0x1B, 0x6C },
        new byte[] { 0x05, 0x05, 0x05, 0x6C },
        new byte[] { },
        new byte[1024]
    };

    var source = new byte[]
    {
        0xF4, 0xF0, 0x58, 0x05, 0xA6, 0xCD, 0x34, 0x05, 0x80, 0xF3, 0x67, 0x5C, 0x05, 0x80, 0x5A, 0x6C, 
        0x58, 0xBF, 0x05, 0x80, 0x5C, 0xFE, 0xB4, 0x8C, 0x05, 0x80, 0x30, 0x05, 0x80, 0x30, 0x6C, 0x77, 
        0x11, 0x70, 0x99, 0xD9, 0xAA, 0xCE, 0x95, 0xDF, 0x17, 0x11, 0x83, 0xCB, 0xF2, 0x0B, 0x73, 0xB8, 
        0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x6C, 0x5A, 0x78, 0x05, 0x80, 0x1B, 0x6C
    };

    var matchIndices = new List<int>();
    int matchIndex = -1;
    while ( ( matchIndex = source.IndexOfAny(anyOf, matchIndex + 1) ) >= 0 )
    {
        matchIndices.Add(matchIndex);
    }

    var output = string.Join(", ", matchIndices.Select(i => i.ToString()));
}

返回：

输出= 0,12,27,54,60

这种对字节数组的扩展方法添加了一个IndexOfAny()方法，它接受字节数组并在源数组中查找匹配项。我相信这将解决原始问题，同时修复通过比较为十六进制引入的几个潜在问题。

我对字符串十六进制比较的问题是：

它使用两倍的内存来存储二进制文件为十六进制和
它可以匹配半字节边界上不是字节边界的字符串。

有关第二种情况的示例，请检查source[1]至source[5]，其中包含：

{ 0xF0, 0x58, 0x05, 0xA6, 0xCD }.AsHex() => "F05805A6CD"

其中十六进制将错误地匹配字节：

{ 0x05, 0x80, 0x5A, 0x6C }.AsHex() => "05805A6C"

我正在寻找一种更有效的方法，它可以处理来自流而不是字节数组的源数据。这意味着可以扫描更大的文件，因为它们不需要加载到内存中进行比较。我在尝试这个问题上遇到了一些问题，在稍后的数组中开始的短匹配优先于先前开始的较长匹配，但尚未完成比较。例如：

var source = new byte[] { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F };
var anyOf = new byte[][]
{
    new byte[] { 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09 },
    new byte[] { 0x05 }
};

将返回索引5处0x05处的匹配，而不是尚未完成比较的索引3处的正确匹配。

希望这有帮助

Answer 2

不确定我是否理解你的意图。但这就是我的想法

//String array with some of the patterns I'm looking for in the byte array
string[] patterns = { "05805A6C", "0580306C", "05801B6C" };

foreach (string p in patterns)
{
    int i=0;
    int indice = 0;

    // teminate loop when no more occurrence is found;
    // using a for loop with i++ is probably wrong since
    // it skips one additional character after a found pattern
    while (indice!=-1) 
    {
        // index if the pattern is found AFTER i position, -1 if not
        indice = hex.IndexOf(p, i);

        //Do some calculations to get the offset I desire to register
        i = indice+ 8; // skip the pattern occurrence itself
        int index = (i / 2);

        //Transform the index into hexadecimal
        string outputHex = int.Parse(index.ToString()).ToString("X");

        //Output the index as an hexadecimal offset address
        MessageBox.Show("0x" + outputHex);
    }
}

通过单独处理模式，您还可以获得更有序的输出。另外，您可以为单一模式搜索定义专用方法。

编辑：关于订购的问题（我想你的意思是从最大到最小的重新排序，对吗？），更改代码如此

//String array with some of the patterns I'm looking for in the byte array
string[] patterns = { "05805A6C", "0580306C", "05801B6C" };

foreach (string p in patterns)
{
    List<int> allIndices = new List<int>();

    int i=0;
    int indice = 0;

    // teminate loop when no more occurrence is found;
    // using a for loop with i++ is probably wrong since
    // it skips one additional character after a found pattern
    while (indice!=-1) 
    {
        // index if the pattern is found AFTER i position, -1 if not
        indice = hex.IndexOf(p, i);

        i = indice+ 8; // skip the pattern occurrence itself

        // temporarily store the occured indices
        if (indice != -1) allIndices.Add(i);
    }

    // does what it says :-)
    allIndices.Reverse();

    // separate loop for the output
    foreach (int j in allIndices)
    {
        //Do some calculations to get the offset I desire to register
        int index = (j / 2);

        //Transform the index into hexadecimal
        string outputHex = int.Parse(index.ToString()).ToString("X");

        //Output the index as an hexadecimal offset address
        MessageBox.Show("0x" + outputHex);
    }
}

如何从字节数组中的字符串数组中搜索？

2 个答案: