Question

我正在创建一个代码，我需要使用一串单词，将其转换为hi bye hi hello将转换为0 1 0 2的数字。我用字典来做这个，这就是我在下一部分遇到麻烦的原因。然后，我需要将其压缩为一个文本文件，然后再解压缩并重新构建为一个字符串。这是我难倒的。

我想这样做的方法是压缩数字的索引，所以0 1 0 2位进入带有字典内容的文本文件，所以在文本文件中它会有0 1 0 2和{hi:0, bye:1, hello:3}。

现在我想做什么来解压缩或读取到python文件中，使用索引（这是我将从现在开始引用0 1 0 2的方式）到然后从字典中取出每个单词并重新构造句子，所以如果 0 出现，它会查看字典，然后查找具有0定义的内容，然后将其拉出来要放入字符串，所以它会找到hi并接受它。

我希望这是可以理解的，并且至少有一个人知道该怎么做，因为我确信这是可能的，但是我无法在这里或在互联网上找到任何提及此主题的内容。

Answer 1

是的，您可以使用常规的词典和列表来存储数据。并使用json或pickle将数据保存到磁盘。

import pickle

s = 'hi hello hi bye'
words = s.split()
d = {}
for word in word:
    if word not in d:
        d[word] = len(d)

data = [d[word] for word in words]

with open('/path/to/file', 'w') as f:
    pickle.dump({'lookup': d, 'data': data}, f)

然后在

中阅读

with open('/path/to/file', 'r') as f:
    dic = pickle.load(f)
    d = d['lookup']
reverse_d = {v: k for k, v in d.iteritems()}
data = d['data']
words = [reverse_d[index] for index in data]
line = ' '.join(words)
print line

Answer 2

因为我并不确切知道你如何创建你的密钥图，所以我能做的最好的就是猜测。在这里，我创建了两个函数，可用于根据键映射将字符串写入txt文件，并读取txt文件并返回基于键映射的字符串。我希望这对您有用，或者至少让您对该过程有充分的了解！祝你好运！

import os

def pack(out_file, string, conversion_map):
    out_string = ''
    for word in string.split(' '):
        for key,value in conversion_map.iteritems():
            if word.lower() == value.lower():
                out_string += str(key)+' '
                break
        else:
            out_string += word+' '

    with open(out_file, 'wb') as file:
        file.write(out_string)

    return out_string.rstrip()

def unpack(in_file, conversion_map, on_lookup_error=None):
    if not os.path.exists(in_file):
        return

    in_file = ''.join(open(in_file, 'rb').readlines())
    out_string = ''
    for word in in_file.split(' '):
        for key, value in conversion_map.iteritems():
            if word.lower() == str(key).lower():
                out_string += str(value)+' '
                break
        else:
            if on_lookup_error:
                on_lookup_error()
            else:
                out_string += str(word)+' '
    return out_string.rstrip()

def fail_on_lookup():
    print 'We failed to find all words in our key map.'
    raise Exception

string = 'Hello, my first name is thelazyscripter'
word_to_int_map = {0:'first',
                   1:'name',
                   2:'is',
                   3:'TheLazyScripter',
                   4:'my'}

d =  pack('data', string, word_to_int_map) #pack and write the data based on the conversion map

print d #the data that was written to the file
print unpack('data', word_to_int_map) #here we unpack the data from the file
print unpack('data', word_to_int_map, fail_on_lookup)

Answer 3

TheLazyScripter为这个问题提供了一个很好的解决方案，但是运行时特性并不好，因为对于每个重构的单词，你必须遍历整个dict。

我会说你选择了错误的字典设计：为了提高效率，应该一步完成查找，因此你应该将数字作为键，将单词作为项目。

由于您的问题看起来像是一个伟大的计算机科学作业（我会考虑我的学生;-)），我只是给你一个解决方案草图：

使用word in my_dict.values() #(adapt for py2/py3)来测试单词是否已经在字典中。
如果不是，insert the next available index为关键字，而单词为值。
你完成了。
用于重建句子，只是
- 循环浏览您的数字列表
- 在数字和print(my_dict[key])
为一个密钥不在dict中的情况准备异常处理（如果你控制整个过程，这不应该发生，但这是一个好习惯）。

此解决方案比您的方法更有效（并且更容易实现）。

如何从字典中定义一个单词

3 个答案: