集

Question

使用https://stackoverflow.com/a/11899925中的以下代码，我能够找到一个单词是否唯一（通过比较它是使用过一次还是多于一次）：

helloString = ['hello', 'world', 'world']
count = {}
for word in helloString :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

但是，如果我有一个包含数百个单词的字符串，我怎样才能计算该字符串中唯一单词的数量？

例如，我的代码有：

uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in words :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

我如何将uniqueWordCount设为6？通常，我真的很擅长解决这些类型的算法难题，但是我没有成功解决这个问题。我觉得它好像在我的鼻子底下。

Answer 1

解决此问题的最佳方法是使用set集合类型。 set是一个集合，其中所有元素都是唯一的。因此：

unique = set([ 'one', 'two', 'two']) 
len(unique) # is 2

您可以从一开始就使用一个集合，随时添加单词：

unique.add('three')

这将在添加时丢弃任何重复项。或者，您可以收集列表中的所有元素并将列表传递给set()函数，这将删除当时的重复项。我上面提供的示例显示了这种模式：

unique = set([ 'one', 'two', 'two'])
unique.add('three')

# unique now contains {'one', 'two', 'three'}

集

您还可以将列表转换为集合，其中所有元素都必须是唯一的。不会丢弃唯一的元素：

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
helloSet = set(helloString) #=> ['doing', 'how', 'are', 'world', 'you', 'hello', 'today']
uniqueWordCount = len(set(helloString)) #=> 7

以下是进一步阅读sets

的链接

计数器

您还可以使用计数器，如果您仍需要该信息，也可以告诉您单词的使用频率。

from collections import Counter

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
len(counter) #=> 7
counter["world"] #=> 2

循环

在您的循环结束时，您可以查看len的{{1}}，同样，您错误输入count为helloString：

words

Answer 3

您可以使用collections.Counter

helloString = ['hello', 'world', 'world']

from collections import Counter

c = Counter(helloString)

print("There are {} unique words".format(len(c)))
print('They are')

for k, v in c.items():
    print(k)

我知道这个问题没有具体要求，但维持秩序

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

c = OrderedCounter(helloString)

print("There are {} unique words".format(len(c)))
print('They are')

for k, v in c.items():
    print(k)

Answer 4

在您当前的代码中，您可以在已设置uniqueWordCount的{{1}}案例中增加else，或者只查找词典中的键数：len(count)

如果您只想知道唯一元素的数量，那么请获取count[word]：len(set(helloString))

中的元素

Answer 5

我可能误解了这个问题，但我相信我的目标是找到所有只在列表中出现一次的元素。

from collections import Counter
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
uniques = [value for value, count in counter.items() if count == 1]

这将给我们6个项目，因为“world”在我们的列表中出现两次：

>>> uniques
['you', 'are', 'doing', 'how', 'today', 'hello']

Answer 6

Counter 是一种有效的方法。此代码类似于计数器，

text = ['hello', 'world']

# create empty dictionary
freq_dict = {}
 
# loop through text and count words
for word in text:
    # set the default value to 0
    freq_dict.setdefault(word, 0)
    # increment the value by 1
    freq_dict[word] += 1
 


for key,value in freq_dict.items():
    if value == 1:
         print(f'Word "{key}" has single appearance in the list')

Word "hello" has single appearance in the list
Word "world" has single appearance in the list

[Program finished]

Answer 7

我会用套装来做这件事。

def stuff(helloString):
    hello_set = set(helloString)
    return len(hello_set)

计算列表中唯一单词的数量

7 个答案:

集

计数器

循环