我正在尝试处理一个文本,即圣经,根据字典提取其单词的数字值:
def gematria(book):
dict = {
'a':1, 'b':2, 'c':3, 'd':4, 'e':5,
'f':80, 'g':3, 'h':8,'i':10, 'j':10,
'k':20, 'l':30, 'm':40, 'n':50, 'o':70,
'p':80, 'q':100,'r':200, 's':300,
't':400, 'u':6, 'v':6, 'w':800, 'x':60,
'y':10, 'z':7
}
使用Nltk模块,我来到:
raw = nltk.corpus.gutenberg.raw(book)
tokens = nltk.word_tokenize(raw)
words_and_numbers = [w.lower() for w in tokens]
words = [w for w in words_and_numbers if re.search('[^0-9:0-9]', w)]
vocab = sorted(set(words))
nested = [list(w) for w in vocab]
我最终得到每个单词字母的字符串列表,
即[['h', 'o', 'l', 'y'],['b', 'i', 'b', 'l', 'e']...]
为了处理单个单词并具有其数值,以下列表推导(后跟函数sum()
)起作用:
word_value_1 = [dict[letter] for letter in nested[0]]
sum(word_value_1)
word_value_2 = [dict[letter] for letter in nested[1]]
sum(word_value_2)
(...)
问题:我如何编写单个列表推导或循环,将大型列表中所有单词的数值返回给我?
答案 0 :(得分:1)
Gematria模9
txt=input('enter text: ')
print(sum([ord(letter)-96 for letter in list("".join(txt.split()))])*9)
ord(a)= 97 so ord(a)-96 = 1并删除空格... 列表可以删除,但为了清楚起见,我允许它
答案 1 :(得分:0)
假设nested = [['h', 'o', 'l', 'y'],['b', 'i', 'b', 'l', 'e']]
print [sum([dict[letter] for letter in word]) for word in nested]
<强>输出强>
[118, 49]