字典和位置列表回到句子

时间:2016-01-21 20:26:43

标签: python

我设法让我的程序将一两句话存入字典,同时创建一个单词位置列表。

我现在需要做的是从字典和位置列表中重新创建原始句子。我已经做了很多搜索,但我得到的结果要么不是我需要的,要么是让我感到困惑和超出我的想法。

非常感谢任何帮助,谢谢。

到目前为止,这是我的代码:

sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           

print ('This is the sentence:', sentence)       

punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]         

for punct in punctuation:                    

    sentence = sentence.replace(punct," %s" % punct)            

print ('This is the sentence with spaces before the punctuations:', sentence)         

words_list = sentence.split()           

print ('A list of the words in the sentence:', words_list)         

dictionary = {}             

word_pos_list = []      

counter = 0                

for word in words_list:                     

    if word not in dictionary:              
        counter += 1                        
        dictionary[word] = counter          

    word_pos_list.append(dictionary[word])      

print ('The positions of the words in the sentence are:', word_pos_list)  

约翰

3 个答案:

答案 0 :(得分:0)

虽然如评论中所提到的,字典不是排序的数据结构,如果你正在分解一个句子并将其索引到字典中并试图将它重新组合在一起,你可以尝试使用集合库中的OrderedDict来做你正在做的事。

那就是说,这没有任何进一步的背景知识或你如何分裂你的句子(标点符号等,如果你正在进行任何类型的自然语言处理(NLP),我建议调查NLTP)。

from collections import OrderedDict
In [182]: def index_sentence(s):
.....:       return {s.split(' ').index(i): i for i in s.split(' ')}
.....:

In [183]: def build_sentence_from_dict(d):
.....:       return ' '.join(OrderedDict(d).values())
.....:

In [184]: s
Out[184]: 'See spot jump over the brown fox.'

In [185]: id = index_sentence(s)

In [186]: id
Out[186]: {0: 'See', 1: 'spot', 2: 'jump', 3: 'over', 4: 'the', 5: 'brown', 6: 'fox.'}

In [187]: build_sentence_from_dict(id)
Out[187]: 'See spot jump over the brown fox.'

In [188]:

答案 1 :(得分:0)

要从列表中重建,您必须撤消位置映射:

# reconstruct
reversed_dictionary = {x:y for y, x in dictionary.items()}
print(' '.join(reversed_dictionary[x] for x in word_pos_list))

使用defaultdict(具有预定义默认值的字典,在您的情况下是单词的位置列表)可以更好地完成此操作:

#!/usr/bin/env python3.4

from collections import defaultdict

# preprocessing
sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           
punctuation = ['()?:;,.!/"\'']         
for punct in punctuation:                    
    sentence = sentence.replace(punct," %s" % punct)

# using defaultdict this time
word_to_locations = defaultdict(list)
for part in enumerate(sentence.split()):
    word_to_locations[part[1]].append(part[0])

# word -> list of locations
print(word_to_locations)

# location -> word
location_to_word = dict((y, x) for x in word_to_locations for y in word_to_locations[x])
print(location_to_word)

# reconstruct
print(' '.join(location_to_word[x] for x in range(len(location_to_word))))

答案 2 :(得分:0)

字典键的随机性不是这里的问题,它是不能记录一个单词被看到,重复或不重复的每个位置。以下是这样做,然后解开字典以产生原始句子,没有标点符号:

from collections import defaultdict

sentence = ("This Sentence is a very, very good sentence. Did you like my very good sentence?")           

print ('This is the sentence:', sentence)       

punctuation = set('()?:;\\,.!/"\'')  

sentence = ''.join(character for character in sentence if character not in punctuation)

print ('This is the sentence with no punctuation:', sentence)

words = sentence.split()

print('A list of the words in the sentence:', words)         

dictionary = defaultdict(list)            

last_word_position = 0   

for word in words:                     

    last_word_position += 1                        

    dictionary[word].append(last_word_position)         

print('A list of unique words in the sentence and their positions:', dictionary.items())         

# Now the tricky bit to unwind our random dictionary:

sentence = []

for position in range(1, last_word_position + 1):
    sentence.extend([word for word, positions in dictionary.items() if position in positions])

print(*sentence)

各种print()语句的输出:

This is the sentence: This Sentence is a very, very good sentence. Did you like my very good sentence?
This is the sentence with no punctuation: This Sentence is a very very good sentence Did you like my very good sentence
A list of the words in the sentence: ['This', 'Sentence', 'is', 'a', 'very', 'very', 'good', 'sentence', 'Did', 'you', 'like', 'my', 'very', 'good', 'sentence']
A list of unique words in the sentence and their positions: dict_items([('Sentence', [2]), ('is', [3]), ('a', [4]), ('very', [5, 6, 13]), ('This', [1]), ('my', [12]), ('Did', [9]), ('good', [7, 14]), ('you', [10]), ('sentence', [8, 15]), ('like', [11])])
This Sentence is a very very good sentence Did you like my very good sentence