Question

以下是示例文本文件的四行...

Star Schema是最简单的数据集市架构风格

星型模式由一个或多个引用任何的事实表组成   维度表数量

注意虚假架构

干杯

python代码应按字母顺序创建排序列表，如下所示，删除重复的单词，首先排序大写单词。

这样的最终输出......

[ “Cheers”, “Pay”, “Schema”, “Star”, “The”, “any”, “bogus”,………..]

Answer 1

您可以使用sorted(s.split())按照您希望的方式对字符串进行排序：

>>> s = 'The Star Schema is the simplest style of data mart schema'
>>> sorted(s.split())
['Schema', 'Star', 'The', 'data', 'is', 'mart', 'of', 'schema', 'simplest', 'style', 'the']

要删除重复项，您可以使用set，但set是无序的，因此您需要再次将其转换为列表（这将由sorted隐式完成）：< / p>

sorted(set(s.split()))

应该是最终答案如何从文件中读取字符串应该非常简单。

Answer 2

您应遵循的步骤：

将文件读入行列表
将行解析成单词，然后将它们添加到您的列表中
删除重复项
排序

此代码应该这样做：

import re # use this library for splitting off words

all_words = [] # initialize list to store the words

with open('my_file.txt') as f: # best way to open a file
   for line in f:
       line = line.strip() # remove trailing newline
       words = re.split(r'\W+', line) # split the line into words even when you have punctuation
       all_words += words

# looping is done now, and all lines have been read

all_words = set(all_words) # remove duplicates
all_words = sorted(all_words) # sort (capitalized words will come first)

如何在python中解析文本文件以创建删除了重复项的排序列表

2 个答案: