如何按照外观排序独特的单词?

时间:2017-01-10 19:05:51

标签: python python-3.x

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<input id="quantity" type="text" name="quantity" value="1">

<button id="update" type="submit" class="hide">Update</button>

对于我的部分工作,我需要识别唯一的单词并将它们发送到文本文件。我理解如何将文本写入文本文件我不明白如何正确地订购此代码,以便在文本文件中重现(如果我要输入“鲜花的世界是一个小世界”:< / p>

restart = True
while restart == True:
    option = input("Would you like to compress or decompress this file?\nIf you would like to compress type c \nIf you would like to decompress type d.\n").lower()

    if option == 'c':

        text = input("Please type the text you would like to compress.\n")
        text = text.split()
        for count,word in enumerate(text):

            if text.count(word) < 2:
                order.append (max(order)+1)

            else:
                order.append (text.index(word)+1)



        print (uniqueWords)
        print (order)
        break
    elif option == 'd':
        pass

    else:
        print("Sorry that was not an option")

表示唯一单词的第一行和显示单词顺序的第二行,以便稍后解压缩。我对数字的解压缩或排序没有任何问题,但只有唯一的单词是有序的。 非常感谢任何帮助!

4 个答案:

答案 0 :(得分:2)

text = "the world of the flowers is a small world to be in"
words = text.split()
unique_ordered = []
for word in words:
    if word not in unique_ordered:
        unique_ordered.append(word)

答案 1 :(得分:0)

from collections import OrderedDict
text = "the world of the flowers is a small world to be in"
words = text.split()
print list(OrderedDict.fromkeys(words))

输出

['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

答案 2 :(得分:0)

这是一个有趣的问题,实际上可以使用字典来保存第一次出现的索引,并检查它是否已经遇到过:

string = "the world of the flowers is a small world to be in"

dct = {}
words = []
indices = []
idx = 1
for substring in string.split():
    # Check if you've seen it already.
    if substring in dct:
        # Already seen it, so append the index of the first occurence
        indices.append(dct[substring])
    else:
        # Add it to the dictionary with the index and just append the word and index
        dct[substring] = idx
        words.append(substring)
        indices.append(idx)
        idx += 1


>>> print(words)
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']
>>> print(indices)
[1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10]

如果您不想索引,那么还有一些外部模块具有这样的功能,可以按照外观顺序获取唯一的单词:

>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

>>> from more_itertools import unique_everseen
>>> list(unique_everseen(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

>>> from toolz import unique
>>> list(unique(string.split()))
['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

答案 3 :(得分:0)

要在保留订单的同时从list删除重复的条目,请检查How do you remove duplicates from a list in whilst preserving order?'s answers。例如:

my_sentence = "the world of the flowers is a small world to be in"
wordlist = my_sentence.split()

# Accepted approach in linked post 
def get_ordered_unique(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

unique_list = get_ordered_unique(wordlist)
# where `unique_list` holds:
#     ['the', 'world', 'of', 'flowers', 'is', 'a', 'small', 'to', 'be', 'in']

然后,为了打印单词的位置,您可以使用 list comprehension 表达式list.index()

>>> [unique_list.index(word)+1 for word in wordlist]
[1, 2, 3, 1, 4, 5, 6, 7, 2, 8, 9, 10]