我正在编写一个Python 3脚本,它将文本文件中的单词转换为数字(我自己的,而不是ASCII,所以没有ord函数)。我已经将每个字母分配给一个整数,并希望每个单词都是其字母数值的总和。目标是将具有相同数值的每个单词分组到字典中。我很难将拆分词重新组合成数字并将它们加在一起。我完全坚持使用这个脚本(它尚未完成。
**顺便说一句,我知道在下面创建l_n字典的简单方法,但是因为我已经把它写出来了,我现在有点懒,不能改变它,但是在脚本完成后会这样做
l_n = {
"A": 1, "a": 1,
"B": 2, "b": 2,
"C": 3, "c": 3,
"D": 4, "d": 4,
"E": 5, "e": 5,
"F": 6, "f": 6,
"G": 7, "g": 7,
"H": 8, "h": 8,
"I": 9, "i": 9,
"J": 10, "j": 10,
"K": 11, "k": 11,
"L": 12, "l": 12,
"M": 13, "m": 13,
"N": 14, "n": 14,
"O": 15, "o": 15,
"P": 16, "p": 16,
"Q": 17, "q": 17,
"R": 18, "r": 18,
"S": 19, "s": 19,
"T": 20, "t": 20,
"U": 21, "u": 21,
"V": 22, "v": 22,
"W": 23, "w": 23,
"X": 24, "x": 24,
"Y": 25, "y": 25,
"Z": 26, "z": 26,
}
words_list = []
def read_words(file):
opened_file = open(file, "r")
contents = opened_file.readlines()
for i in range(len(contents)):
words_list.extend(contents[i].split())
opened_file.close()
return words_list
read_words("file1.txt")
new_words_list = list(set(words_list))
numbers_list = []
w_n = {}
def words_to_numbers(new_words_list, l_n):
local_list = new_words_list[:]
local_number_list = []
for word in local_list:
local_number_list.append(word.split())
for key in l_n:
local_number_list = local_number_list.replace( **#I am stuck on the logic in this section.**
words_to_numbers(new_words_list, l_n)
print(local_list)
我试过在stackoverflow上寻找答案,但无法找到答案。
感谢您的帮助。
答案 0 :(得分:6)
你必须处理标点符号,但你只需要将每个单词字母的值加起来并将它们分组,你可以使用defaultdict:
#include "word_indexer.h"
输出:
lines = """am writing a Python script that will take words in a text file and convert them into numbers (my own, not ASCII, so no ord function).
I have assigned each letter to an integer and would like each word to be the sum of its letters' numerical value.
The goal is to group each word with the same numerical value into a dictionary.
I am having great trouble recombining the split words as numbers and adding them together"""
from collections import defaultdict
d = defaultdict(list)
for line in lines.splitlines():
for word in line.split():
d[sum(l_n.get(ch,0) for ch in word)].append(word)
from pprint import pprint as pp
pp(dict(d))
{1: ['a', 'a', 'a'],
7: ['be'],
9: ['I', 'I'],
14: ['am', 'am'],
15: ['an'],
17: ['each', 'each', 'each'],
19: ['and', 'and', 'and'],
20: ['as'],
21: ['of'],
23: ['in'],
28: ['is'],
29: ['no'],
32: ['file'],
33: ['the', 'The', 'the', 'the'],
34: ['so'],
35: ['to', 'to', 'goal', 'to'],
36: ['have'],
37: ['take', 'ord', 'like'],
38: ['(my', 'same'],
39: ['adding'],
41: ['ASCII,'],
46: ['them', 'them'],
48: ['its'],
49: ['that', 'not'],
51: ['great'],
52: ['own,'],
53: ['sum'],
56: ['will'],
58: ['into', 'into'],
60: ['word', 'word', 'with'],
61: ['value.', 'value', 'having'],
69: ['text'],
75: ['would'],
76: ['split'],
77: ['group'],
78: ['assigned', 'integer'],
79: ['words', 'words'],
80: ['letter'],
85: ['script'],
92: ['numbers', 'numbers'],
93: ['trouble'],
96: ['numerical', 'numerical'],
97: ['convert'],
98: ['Python', 'together'],
99: ["letters'"],
100: ['writing'],
102: ['function).'],
109: ['recombining'],
118: ['dictionary.']}
获取单词中所有字母的总和,我们将其用作键,然后将该单词作为值附加。 defaultdict处理重复的键,因此我们将结束列表中具有相同总和的所有单词。
同样,John评论说你只需在字典中存储一组小写字母,然后拨打sum(l_n.get(ch,0) for ch in word)
.lower
如果您要删除所有标点符号,可以使用sum(l_n.get(ch,0) for ch in word.lower())
:
str.translate
哪个会输出:
from collections import defaultdict
from string import punctuation
d = defaultdict(list)
for line in lines.splitlines():
for word in line.split():
word = word.translate(None,punctuation)
d[sum(l_n.get(ch,0) for ch in word)].append(word)
如果您不想出现重复的单词,请使用集合:
{1: ['a', 'a', 'a'],
7: ['be'],
9: ['I', 'I'],
14: ['am', 'am'],
15: ['an'],
17: ['each', 'each', 'each'],
19: ['and', 'and', 'and'],
20: ['as'],
21: ['of'],
23: ['in'],
28: ['is'],
29: ['no'],
32: ['file'],
33: ['the', 'The', 'the', 'the'],
34: ['so'],
35: ['to', 'to', 'goal', 'to'],
36: ['have'],
37: ['take', 'ord', 'like'],
38: ['my', 'same'],
39: ['adding'],
41: ['ASCII'],
46: ['them', 'them'],
48: ['its'],
49: ['that', 'not'],
51: ['great'],
52: ['own'],
53: ['sum'],
56: ['will'],
58: ['into', 'into'],
60: ['word', 'word', 'with'],
61: ['value', 'value', 'having'],
69: ['text'],
75: ['would'],
76: ['split'],
77: ['group'],
78: ['assigned', 'integer'],
79: ['words', 'words'],
80: ['letter'],
85: ['script'],
92: ['numbers', 'numbers'],
93: ['trouble'],
96: ['numerical', 'numerical'],
97: ['convert'],
98: ['Python', 'together'],
99: ['letters'],
100: ['writing'],
102: ['function'],
109: ['recombining'],
118: ['dictionary']}
答案 1 :(得分:1)
我认为这也是做这件事的好方法
import string
letters = string.lowercase
def give_sum(str):
ans = 0
for i in str:
if i.lower() in letters:
value = letters.find(i.lower()) + 1
ans += value
return ans
w_n = {}
with open('file1.txt') as f:
for line in f:
w_n[give_sum(line)] = [line]
print w_n
ps:根据您的要求优化代码
答案 2 :(得分:0)
正如你所提到的,这不是最好的方法,但如果我们完全以你的方式编码,这将是完成的代码,我检查它并且它的工作原理。
您需要更改def words_to_numbers
代码并根据您的l_n
字典计算每个字符串的值,其中键是字符串,值是列表。
l_n = {
"A": 1, "a": 1,
"B": 2, "b": 2,
"C": 3, "c": 3,
"D": 4, "d": 4,
"E": 5, "e": 5,
"F": 6, "f": 6,
"G": 7, "g": 7,
"H": 8, "h": 8,
"I": 9, "i": 9,
"J": 10, "j": 10,
"K": 11, "k": 11,
"L": 12, "l": 12,
"M": 13, "m": 13,
"N": 14, "n": 14,
"O": 15, "o": 15,
"P": 16, "p": 16,
"Q": 17, "q": 17,
"R": 18, "r": 18,
"S": 19, "s": 19,
"T": 20, "t": 20,
"U": 21, "u": 21,
"V": 22, "v": 22,
"W": 23, "w": 23,
"X": 24, "x": 24,
"Y": 25, "y": 25,
"Z": 26, "z": 26,
}
words_list = []
def read_words(file):
opened_file = open(file, "r")
contents = opened_file.readlines()
for i in range(len(contents)):
words_list.extend(contents[i].split())
opened_file.close()
return words_list
read_words("file1.txt")
new_words_list = list(set(words_list))
print "new_word_list",new_words_list
numbers_list = []
w_n = {}
def words_to_numbers(new_words_list,l_n):
local_list = new_words_list[:]
for word in local_list:
tmp = 0
for ch in word:
tmp += l_n[ch]
if str(tmp) in w_n:
w_n[str(tmp)].append(word)
else:
tmp_lis = []
tmp_lis.append(word)
w_n[str(tmp)] = tmp_lis
return w_n
print "the_answer_is ==> ",words_to_numbers(new_words_list,l_n)