Question

我有一个300,000多个单词的数组，最长的单词是38个字符。我正在构建一个应用程序，根据您已输入的内容，告诉您下一个最可能的字母。例如，当您输入单词＆＃34; orange＆＃34;该应用程序将首先说：

You entered 'o'
x.xx% of the time, the next letter is 'n'

You entered 'or'
'x.xx% of the time, the next letter is 'a'

You entered 'ora'
x.xx% of the time, the next letter is 'n'

等等。

到目前为止我存储数据的方式是这样的，在一个名为perc的字典中

通用：

dict[mainkey][subkey] = subkey_value

实际值：

perc[previousLetter][letter] = percentage   # how often does letter follow previousLetter?
perc['a']['b'] = 3.15%         # 'b' follows 'a' 3.15% of the time

这种结构非常简约，我喜欢能够为更长的字符串做同样的事情，例如：

perc['ora']['n']               # 'n' follows 'ora' 0.5% of the time

但是考虑到我必须考虑38个字符的所有排列，那个词典很快会变得非常大...... Yikes！

更有效的方法是将更长的字符串存储为key？

从字符串中存储单个字符的最有效方法？

0 个答案: