Question

我从循环中获得的数据会生成一系列字符串，这些字符串是从数据库中检索到的句子。但是，我数据库中的数据结构需要重复，但是我想在输出中省略重复项。假设我的循环和结果如下：

for text in document:
   print(text)

输出：

He goes to school.
He works here.
we are friends.
He goes to school.
they are leaving us alone.
..........

我该如何设置条件，以便程序读取生成的所有输出，并且如果发现重复的结果（例如，他上学了），它将只向我显示一条记录，而不是多个类似的记录？

Answer 1

already_printed = set()
for text in document:
   if text not in already_printed:
       print(text)
       already_printed.add(text)

Answer 2

您可以使用set。喜欢：

values = set(document)
for text in values:
   print(text)

或者可以使用list：

temp_list = []
for text in document:
   if text not in temp_list:
       temp_list.append(text)
       print(text)

或者您可以使用dict：

temp_dict={}
for text in document:
   if text not in temp_dict.keys():
       temp_dict[text]=1
       print(text)

Answer 3

用'\ n'分割文档，或按行将其读取到arr = []。即在for循环存储arr += row.lowercase()中。

arr = list(set(arr))将删除重复项。

Answer 4

如果大小写无关紧要，则可以列出列表。

for text in set(i.lower() for i in document):
    print (text)

Answer 5

使用python的内置选项SET删除重复项

documents = ["He goes to school", "He works here. we are friends", "He goes to school", "they are leaving us alone"]

list(set(document))