小生境的T9系统

时间:2012-08-22 14:07:18

标签: python

我正在尝试使用手机中的T9系统,但使用键盘代替。 我真的需要一些关于如何做到这一点的建议。

我已经找到了一个包含我想要使用的文字的文本文件。 我希望能够使用数字2按钮作为'abc'3 ='def',4 ='ghi'..等等 如果有人在开车或只是可以帮助我走上这条道路,那就可以了。

3 个答案:

答案 0 :(得分:4)

这是一个蛮力的T9模仿者:

import itertools 

n2l={2:'abc',3:'def',4:'ghi',5:'jkl',6:'mno',7:'pqrs',8:'tuv',9:'wxyz'}

with open('/usr/share/dict/words','r') as di:  # UNIX 250k unique word list 
    all_words={line.strip() for line in di}

def combos(*nums):
    t=[n2l[i] for i in nums]
    return tuple(''.join(t) for t in itertools.product(*(t)))

def t9(*nums):
    combo=combos(*nums)
    return sorted(word for word in all_words if word.startswith(combo))

def try_it(*nums):
    l=list(t9(*nums))
    print('  {:10} {:10,} words'.format(','.join(str(i) for i in nums),len(l)))
    if len(l)<100:
        print(nums,'yields:',l)

try_it(2)
try_it(2,3)
try_it(2,3,4)
try_it(2,3,3,4)
try_it(2,3,3,4,5)

打印:

  2              41,618 words
  2,3             4,342 words
  2,3,4             296 words
  2,3,3,4           105 words
  2,3,3,4,5          16 words
(2, 3, 3, 4, 5) yields: ['aedile', 'aedileship', 'aedilian', 'aedilic', 'aedilitian', 
    'aedility', 'affiliable', 'affiliate', 'affiliation', 'bedikah', 'befilch', 
    'befile', 'befilleted', 'befilmed', 'befilth', 'cedilla']

你可以看到从25万字(一个非常大的集合)开始需要5个数字才能收敛到可管理的大小。

虽然此代码是说明性的,并且可以帮助您入门,但您还需要做两件事:

  1. 一组较小的单词;-)和
  2. 将在您的用户界面的T9自动完成区域中显示的更常见字词的排名。 (即'附属'或'附属'更可能是来自(2,3,3,4,5)的所需单词而不是'aedile'或'befilth'。这些需要以某种方式排名...)

  3. 选择2

    这是加权的快速尝试。我读了同一个大字典(常见的Unix'单词'文件),然后使用Project Gutenberg's The Adventures of Sherlock Holmes对这些单词进行加权。您可以使用任何好的文本集合来执行此操作。

    from collections import Counter
    import re
    import itertools 
    
    all_words=Counter()
    n2l={2:'abc',3:'def',4:'ghi',5:'jkl',6:'mno',7:'pqrs',8:'tuv',9:'wxyz'}
    with open('/usr/share/dict/words','r') as di:  # UNIX 250k unique word list 
         all_words.update({line.strip() for line in di if len(line) < 6}) 
    
    with open('holmes.txt','r') as fin:   # http://www.gutenberg.org/ebooks/1661.txt.utf-8
        for line in fin:
             all_words.update([word.lower() for word in re.findall(r'\b\w+\b',line)])
    
    def combos(*nums):
        t=[n2l[i] for i in nums]
        return tuple(''.join(t) for t in itertools.product(*(t)))
    
    def t9(*nums):
        combo=combos(*nums)
        c1=combos(nums[0])
        first_cut=(word for word in all_words if word.startswith(c1))
        return (word for word in first_cut if word.startswith(combo))
    
    def try_it(*nums):
        s=set(t9(*nums))
        n=10
        print('({}) produces {:,} words. Top {}:'.format(','.join(str(i) for i in nums),
                len(s),min(n,len(s))))
        for i, word in enumerate(
              [w for w in sorted(all_words,key=all_words.get, reverse=True) if w in s],1):
            if i<=n:
                print ('\t{:2}:  "{}" -- weighted {}'.format(i, word, all_words[word]))
    
        print()        
    
    try_it(2)
    try_it(2,3)
    try_it(2,3,4)
    try_it(2,3,3,4)
    try_it(6,6,8,3)   
    try_it(2,3,3,4,5)      
    

    打印:

    (2) produces 2,584 words. Top 10:
         1:  "and" -- weighted 3089
         2:  "a" -- weighted 2701
         3:  "as" -- weighted 864
         4:  "at" -- weighted 785
         5:  "but" -- weighted 657
         6:  "be" -- weighted 647
         7:  "all" -- weighted 411
         8:  "been" -- weighted 394
         9:  "by" -- weighted 372
        10:  "are" -- weighted 356
    
    (2,3) produces 261 words. Top 10:
         1:  "be" -- weighted 647
         2:  "been" -- weighted 394
         3:  "before" -- weighted 166
         4:  "after" -- weighted 99
         5:  "between" -- weighted 60
         6:  "better" -- weighted 51
         7:  "behind" -- weighted 50
         8:  "certainly" -- weighted 45
         9:  "being" -- weighted 45
        10:  "bed" -- weighted 40
    
    (2,3,4) produces 25 words. Top 10:
         1:  "behind" -- weighted 50
         2:  "being" -- weighted 45
         3:  "began" -- weighted 25
         4:  "beg" -- weighted 13
         5:  "ceiling" -- weighted 10
         6:  "beginning" -- weighted 7
         7:  "begin" -- weighted 6
         8:  "beggar" -- weighted 6
         9:  "begging" -- weighted 4
        10:  "begun" -- weighted 4
    
    (2,3,3,4) produces 5 words. Top 5:
         1:  "additional" -- weighted 4
         2:  "addition" -- weighted 3
         3:  "addicted" -- weighted 1
         4:  "adding" -- weighted 1
         5:  "additions" -- weighted 1
    
    (6,6,8,3) produces 11 words. Top 10:
         1:  "note" -- weighted 38
         2:  "notes" -- weighted 9
         3:  "move" -- weighted 5
         4:  "moved" -- weighted 4
         5:  "novel" -- weighted 4
         6:  "movement" -- weighted 3
         7:  "noted" -- weighted 2
         8:  "moves" -- weighted 1
         9:  "moud" -- weighted 1
        10:  "november" -- weighted 1
    
    (2,3,3,4,5) produces 0 words. Top 0:
    

答案 1 :(得分:0)

一种天真的方法是生成由给定数字序列产生的所有可能的字母组合。请注意,这些组合基本上是N个元组字母的cartesian product,每个元组对应一个数字,N是单词的长度。要获得所有组合,您可以使用itertools.product,例如:

itertools.product(*(letters(d) for d in digits))

其中letters是一个函数,以便letters('1')返回'abc'等等,digits是表示单词的数字字符串。 然后遍历你的单词列表并找到匹配。

答案 2 :(得分:0)

Dictionary<char,char[]> btnDict = new Dictionary<char,char[]>()
        {
            {'0',new char[]{'A','B','C'}},
            {'1',new char[]{'D','E','F'}},
            {'2',new char[]{'G','H','I'}},
            {'3',new char[]{'J','K','L'}},
            {'4',new char[]{'M','N','0'}},
            {'5',new char[]{'P','Q'}},
            {'6',new char[]{'R','S','T'}},
            {'7',new char[]{'U','V','W'}},
            {'8',new char[]{'X','Y','Z'}},
            {'9',new char[]{'#','@','.'}}
        };

        public void PrintT9(string input)
        {
            char[] T9Suggestion = new char[input.Length];
            FillPosition(T9Suggestion, 0, input.ToArray<char>());
        }

        void FillPosition(char[] array, int position, char[] input)
        {
            char[] alphabets = btnDict[input[position]];
            foreach (char alphabet in alphabets)
            {
                array[position] = alphabet;
                if (position == array.Length - 1)
                {
                    string s = new string(array);
                    Console.Write(s+",");
                }
                else
                {
                    FillPosition(array, position + 1, input);
                }
            }

        }
    }

http://coding4geeks.blogspot.com/2015/01/t9-dictionary.html