我是一名本科生,刚来这里并且喜欢编程。我在实践中遇到问题,想在这里寻求帮助。
给一个字符串一个整数n,返回第n个最常见的单词及其计数,忽略大小写。
对于单词,返回时请确保所有字母均为小写!
提示:split()函数和字典可能会有用。
示例:
输入:“ apple apple apple blue BlUe call”,2
输出:列表[“ blue”,2]
我的代码如下:
from collections import Counter
def nth_most(str_in, n):
split_it = str_in.split(" ")
array = []
for word, count in Counter(split_it).most_common(n):
list = [word, count]
array.append(count)
array.sort()
if len(array) - n <= len(array) - 1:
c = array[len(array) - n]
return [word, c]
测试结果如下:
Traceback (most recent call last):
File "/grade/run/test.py", line 10, in test_one
self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
File "/grade/run/bin/nth_most.py", line 10, in nth_most
c = array[len(array) - n]
IndexError: list index out of range
以及
Traceback (most recent call last):
File "/grade/run/test.py", line 20, in test_negative
self.assertEqual(nth_most('awe Awe AWE BLUE BLUE call', 1), ['awe', 3])
AssertionError: Lists differ: ['BLUE', 2] != ['awe', 3]
First differing element 0:
'BLUE'
'awe'
我不知道我的代码有什么问题。
非常感谢您的帮助!
答案 0 :(得分:3)
计数器按顺序返回大多数公社元素,因此您可以这样做:
list(Counter(str_in.lower().split()).most_common(n)[-1]) # n is nth most common word
答案 1 :(得分:2)
由于您使用的是Counter
,因此请明智地使用它:
import collections
def nth_most(str_in, n):
c = sorted(collections.Counter(w.lower() for w in str_in.split()).items(),key = lambda x:x[1])
return(list(c[-n])) # convert to list as it seems to be the expected output
print(nth_most("apple apple apple blue BlUe call",2))
建立词频字典,根据值(元组的第二个元素)对项目进行排序,并选择第n个最后一个元素。
这将打印['blue', 2]
。
如果在第一个或第二个位置有两个具有相同频率(并列)的单词怎么办?该解决方案不起作用。相反,对出现次数进行排序,提取出第n个最常见的出现,然后再次运行计数器dict以提取匹配项。
def nth_most(str_in, n):
c = collections.Counter(w.lower() for w in str_in.split())
nth_occs = sorted(c.values())[-n]
return [[k,v] for k,v in c.items() if v==nth_occs]
print(nth_most("apple apple apple call blue BlUe call woot",2))
这次打印:
[['call', 2], ['blue', 2]]
答案 2 :(得分:2)
def nth_common(lowered_words, check):
m = []
for i in lowered_words:
m.append((i, lowered_words.count(i)))
for i in set(m):
# print(i)
if i[1] == check: # check if the first index value (occurrance) of tuple == check
print(i, "found")
del m[:] # deleting list for using it again
words = ['apple', 'apple', 'apple', 'blue', 'BLue', 'call', 'cAlL']
lowered_words = [x.lower() for x in words] # ignoring the uppercase
check = 2 # the check
nth_common(lowered_words, check)
输出:
('blue', 2) found
('call', 2) found
答案 3 :(得分:1)
Traceback (most recent call last):
File "/grade/run/test.py", line 10, in test_one
self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
File "/grade/run/bin/nth_most.py", line 10, in nth_most
c = array[len(array) - n]
IndexError: list index out of range
要解决此列表中的索引错误,只需输入
maxN = 1000 #change according to your max length
array = [ 0 for _ in range( maxN ) ]
答案 4 :(得分:0)
即使没有收集模块,您也可以得到: 段落=“诺里是天主教徒,因为她的母亲是天主教徒,诺里的母亲是天主教徒,因为她的父亲是天主教徒,而她父亲是天主教徒,因为他的母亲是天主教徒,或者曾经是天主教徒。”
def nth_common(n,p):
words=re.split('\W+',p.lower())
word_count={}
counter=0
for i in words:
if i in word_count:
word_count[i]+=1
else:
word_count[i]=1
sorted_count = sorted(word_count.items(), key=lambda x: x[1],reverse=True)
return sorted_count[n-1]
nth_common(3,paragraph)
输出将为('catholic',6)
排序(基于计数)字数输出: [('was',6),('a',6),('catholic',6),('because',3),('her',3),('mother',3),( 'nory',2),('and',2),('father',2),('s',1),('his',1),('or',1),('had ',1),('been',1)]