gematria功能 - 根据数值处理文本

时间:2014-07-07 02:45:10

标签: python dictionary nltk

我正在尝试处理一个文本,即圣经,根据字典提取其单词的数字值:

def gematria(book):

    dict = {
              'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 
              'f':80, 'g':3, 'h':8,'i':10, 'j':10,
              'k':20, 'l':30, 'm':40, 'n':50, 'o':70,
              'p':80, 'q':100,'r':200, 's':300,
              't':400, 'u':6, 'v':6, 'w':800, 'x':60, 
              'y':10, 'z':7
           }

使用Nltk模块,我来到:

raw = nltk.corpus.gutenberg.raw(book)
tokens = nltk.word_tokenize(raw)
words_and_numbers = [w.lower() for w in tokens]
words = [w for w in words_and_numbers if re.search('[^0-9:0-9]', w)]
vocab = sorted(set(words))
nested = [list(w) for w in vocab]

我最终得到每个单词字母的字符串列表, 即[['h', 'o', 'l', 'y'],['b', 'i', 'b', 'l', 'e']...]

为了处理单个单词并具有其数值,以下列表推导(后跟函数sum())起作用:

word_value_1 = [dict[letter] for letter in nested[0]]
sum(word_value_1)

word_value_2 = [dict[letter] for letter in nested[1]]
sum(word_value_2)

(...)

问题:我如何编写单个列表推导或循环,将大型列表中所有单词的数值返回给我?

2 个答案:

答案 0 :(得分:1)

Gematria模9

txt=input('enter text: ')
print(sum([ord(letter)-96 for letter in list("".join(txt.split()))])*9)

ord(a)= 97 so ord(a)-96 = 1并删除空格... 列表可以删除,但为了清楚起见,我允许它

答案 1 :(得分:0)

假设nested = [['h', 'o', 'l', 'y'],['b', 'i', 'b', 'l', 'e']]

print [sum([dict[letter] for letter in word]) for word in nested]

<强>输出

[118, 49]