应用错误收集

假设我有一个txt文件，每行代表一个字符串。是否有一些有效的方法来找出前10个频繁的子串。

难点在于给定字符串的子字符串排列的大小太大。给定N个字符串长度，它有总共C(N,0)+C(N,1)+..C(N,N)种子字符串。

=============================================== = [更新

问题类似于“[链接] Algorithm to find the most common substrings in a string”，但两者都是不相同。区别在于我试图在所有字符串中查找前10个频繁子字符串，而只是在“[a link] Algorithm to find the most common substrings in a string中的一个字符串中找到最长的子字符串以找到最常见的一个字符串“中的子字符串，这只是本地优化。

尽管通过“[a link] Algorithm to find the most common substrings in a string”中的方法，所有字符串中的一个子字符串并不常见，但它可能是最常见的。例如，我有10个字符串，字符串最常见 str1 sub_str1 --4次
str2 sub_str2 - 4次 ..
str10 sub_str10

每个字符串中最常见的子字符串是不同的，每个字符串出现4次。有可能在所有字符串中出现另一个名为sub_minor的子字符串，并且只发生1次。因此，此sub_minor字符串最常见，因为它出现10超过所有其他sub_str字符串。

所有sub_str都只是局部优化而不是全局优化，我的问题主要是全局优化，这与“[a link] Algorithm to find the most common substrings in a string”

不同

如何从字符串数据库中找到前10个频繁子字符串

0 个答案: