Question

我有一个已知大小的密钥，例如：

String key = "A B C"; // Unknown / This is what I need to guess in the end
int keySize = key.length(); // Known

我知道密钥和文本只包含以下字符：

String AVAILABLE_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ .,!?-"; // Known

我有一些文本是通过用密钥对文本进行异或来编码的。 encode - 方法执行以下操作：检查key和UPPERCASE文本是否为空且不为空且仅包含有效字符，然后创建字符串的UTF-8字节数组并将它们一起XOR到一个字节[ ]。（如果文本比密钥长，则密钥会再次重复。）

byte[][] encryptedTexts = new byte[5][];
// The original texts are Unknown, the encrypted byte-arrays are Known
encryptedTexts[0] = encode(key, "THIS IS A TEST");
encryptedTexts[1] = encode(key, "This is another test!"); // Note: encode first makes the String UPPERCASE, so this encrypts correctly.
encryptedTexts[2] = encode(key, "SOME OTHER RANDOM TEXT");
encryptedTexts[3] = encode(key, "AND LET'S SEE HOW THIS GOES"); // Should return null since ' in LET'S isn't valid
encryptedTexts[0] = encode(key, "OK, THAT WILL BE ENOUGH FOR NOW..");

编码后，我有以下加密字节数组（Arrays.toString(byte_array)）：

ENCRYPTED TEXT 1: [21, 104, 11, 115, 99, 8, 115, 98, 97, 99, 21, 101, 17, 116]
ENCRYPTED TEXT 2: [21, 104, 11, 115, 99, 8, 115, 98, 97, 13, 14, 116, 10, 101, 17, 97, 116, 7, 115, 23, 96]
ENCRYPTED TEXT 3: [18, 111, 15, 101, 99, 14, 116, 10, 101, 17, 97, 114, 3, 110, 7, 14, 109, 98, 116, 6, 25, 116]
ENCRYPTED TEXT 4: null
ENCRYPTED TEXT 5: [14, 107, 110, 0, 23, 9, 97, 22, 0, 20, 8, 108, 14, 0, 1, 4, 0, 7, 110, 12, 20, 103, 10, 0, 5, 14, 114, 98, 110, 12, 22, 14, 108]

所以，现在我的问题是：如何通过只知道加密文本和密钥大小来获取密钥？

一些想法：

I know you can easily get the key by XOR-ing the original text with the encrypted text. Problem: I don't have the original text.
I know you can partly decrypt one text by using another text's repeated words (like " the ") and then guess the other part. Problems: This only works when the text(s) are pretty long, contain the guessed word (like " the ") and ARE words in general. This method won't work when the original texts are also just randomly generated characters, even when the size is very large / 100,000+.
I know that XOR-ing the same characters with each other will return a 0-byte. In the example above, with the 5th encrypted text, we see a few 0's. When a 0 is found this means that the original text and the key share the same character at the same index. Problem: I don't have the original text.

当您只知道加密的字节数组（ inifite 数量）和密钥大小时，是否可以获取密钥？如果是的话，最好的方法是什么？

一些注意事项：

我不关心解密加密文本，我的目标是获取key-String。
如果您要发布示例代码，请在Java中执行此操作，因为这是我正在使用的编程语言。
这只是一项任务（不是针对学校，而是针对Java cursus），所以我不打算用它来破解它。（虽然我可能会嘲笑那些使用相同密钥进行XOR加密的人.XOR加密应该只使用与文本相同或更大的真正随机生成的密钥来完成，也称为一次性垫。引用：“使用一个真正随机的密钥，结果是一次性填充，即使在理论上也是不可破坏的。”[source]。）

编辑1：

好的，忘记随机生成的未加密文本，我们假设我有一个加密的大型英文文本。如果我事先知道文本是英文，我可以使用Letter Frequency Analysis Table。那么我不仅知道加密文本和密钥大小，还知道这些字母的频率。我如何使用这些额外的频率来获取密钥。（假设我拥有无限数量的加密文本，以便使用XOR解密重新创建/猜测密钥。）

Answer 1

你可能只对密钥感兴趣，但试着专注于获得其中一个明文。这当然会产生关键。

首先将xor对的明文组合在一起（如果它们的长度不同，则截断时间最长）。这将删除密钥，并为您提供一对英语句子（-fragments）xor＆ed。在一起。

假设无限制的密文，我们可以采取一种简单的方法：

拿一个密文和xor一起说1000个其他密文。查看约90％的对中第6位为1的所有位置。这些位置必须在第一个密文中有一个[。，！？ - ]，其中80％的概率是空格。假设这是一个空格并计算等效键字节必须是什么，如果这是真的。

对于一堆其他密文重复此操作，您将能够确定哪个[。，！？ - ]实际上是空格（~80％将在此位置具有相同的键值）。

这是Java中的一个实现。它通常使用几千条消息来找到密钥：

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Random;

public class MultitimePad {
    private final static int SPACE_TEST_NUM = 10;
    private final static int SPACE_TEST_MIN = 8;
    private final static int KEY_GUESS_MIN = 10;
    private final static double KEY_GUESS_MIN_PERCENTAGE = 0.8;

    public static void main(String[] args) throws IOException {
        MultitimePad p = new MultitimePad();
        byte[] key = new byte[256];
        new Random().nextBytes(key);
        byte[][] messages = p.generate(key);
        byte[] solvedKey = p.solve(key.length, messages);
        if (compareBytes(key, solvedKey)) {
            System.out.println("Success");
        } else {
            System.out.println("Failure");
        }
    }

    private byte[][] generate(byte[] key) throws IOException {
        byte[] data = Files.readAllBytes(Paths.get("src/ulysses.txt"));
        byte[] filteredData = new byte[data.length];
        int filteredDataLength = 0;
        for (int i = 0; i < data.length; i++) {
            byte p = data[i];
            if (p >= 'a' && p <= 'z') {
                filteredData[filteredDataLength] = (byte) (p - 'a' + 'A');
                filteredDataLength++;
            } else if (p >= 'A' && p <= 'Z') {
                filteredData[filteredDataLength] = p;
                filteredDataLength++;
            } else if (p == ' ' || p == '.' || p == ',' || p == '!' || p == '?' || p == '-') {
                filteredData[filteredDataLength] = p;
                filteredDataLength++;
            }
        }
        int numMessages = filteredDataLength / key.length;
        byte[][] messages = new byte[numMessages][];
        for (int i = 0; i < numMessages; i++) {
            messages[i] = new byte[key.length];
            for (int j = 0; j < key.length; j++) {
                byte p = filteredData[i * key.length + j];
                messages[i][j] = (byte) (p ^ key[j]);
            }
        }
        return messages;
    }

    private static boolean compareBytes(byte[] b1, byte[] b2) {
        if (b1.length != b2.length) {
            return false;
        }
        for (int i = 0; i < b1.length; i++) {
            if (b1[i] != b2[i]) {
                return false;
            }
        }
        return true;
    }

    private byte[] solve(int length, byte[][] messages) {
        byte[] key = new byte[length];
        for (int i = 0; i < length; i++) {
            key[i] = solvePosition(i, messages);
        }
        return key;
    }

    private byte solvePosition(int pos, byte[][] messages) {
        int[] keyGuessCount = new int[256];
        int totalKeyGuess = 0;
        for (int i = 0; i < messages.length - SPACE_TEST_NUM; i++) {
            int success = 0;
            for (int j = 0; j < SPACE_TEST_NUM; j++) {
                if (((messages[i][pos] ^ messages[i + j][pos]) & ' ') != 0) {
                    success++;
                }
            }
            if (success >= SPACE_TEST_MIN) {
                int keyGuess = (messages[i][pos] ^ ' ') & 0xFF;
                keyGuessCount[keyGuess]++;
                totalKeyGuess++;
                if (keyGuessCount[keyGuess] >= KEY_GUESS_MIN && keyGuessCount[keyGuess] > totalKeyGuess *
                        KEY_GUESS_MIN_PERCENTAGE) {
                    System.out.println("Found " + pos + " using " + (i + 1 + SPACE_TEST_NUM) + " messages");
                    return (byte) keyGuess;
                }
            }
        }
        throw new IllegalArgumentException("Too few messages");
    }
}

Answer 2

由于您只允许密钥和数据中的字符子集，因此加密文本会泄漏有关这两者的信息。看一下允许输入的二进制表示：

           01000001 : A              01010001 : Q    
           01000010 : B              01010010 : R    
           01000011 : C              01010011 : S    
           01000100 : D              01010100 : T    
           01000101 : E              01010101 : U    
           01000110 : F              01010110 : V    
           01000111 : G              01010111 : W    
           01001000 : H              01011000 : X    
           01001001 : I              01011001 : Y                           
           01001010 : J              01011010 : Z                           
           01001011 : K          >   00100000 :     <  7th bit is 0
           01001100 : L          >   00101110 : .   <      ""
           01001101 : M          >   00101100 : ,   <      ""
           01001110 : N          >   00100001 : !   <      "" 
           01001111 : O          >   00111111 : ?   <      "" 
           01010000 : P          >   00101101 : -   <      ""

注意位的布局。一种模式是允许的字符中有6个将第7位设置为0，其余允许的字符将此位设置为1。

现在仔细看看第一个加密字符串：

ENCRYPTED TEXT 1:        21       104        11       115        99         8  ...
Binary:            00010101  01101000  00001011  01110011  01100011  00001000  ...
                    ^         ^         ^         ^         ^         ^
  Bit 7             0         1         0         1         0         1

注意加密数据如何在位置7上具有切换位。第一个字节将第7个位设置为0，这只有在密钥和数据在第7位中都为0，或者密钥和数据都为1时才会发生在这个位置7.我们可以从中扣除：

键的第一个字符，和数据的第一个字符在该范围内的 [A-Z]

或

密钥的第一个字符，和第一个数据字符是 [0。，！？ - ]

这只显示最明显的模式，但该技术可以应用于所有位，如果重复可以用于构建可能的键和数据值的统计模型。如果你有一个重复键，你可能会以这种方式得到足够的泄漏，它只是可能的实际键和数据。

通过仅知道XOR加密的字节数组和密钥大小来获取key-String

2 个答案: