Question

我正在尝试从头开始编写一个函数f(x, n)，它返回在排序列表中出现n次或更多次的单词。

例如：

f("the apple the banana the apple", 2)
>>> ['apple', 'the']

因为the和apple是唯一出现两次或更多次的词

另一个例子：

f("the kid jumped off the roof", 1)
>>> ['jumped', 'kid', 'off', 'roof', the']

到目前为止我没有运气的尝试：

def f(x, n):
   words = list(x.split())
   a= ""
   for word in words:
     if len(word) >= n:
       a += word
         return(list(word))

Answer 1

这会迭代分割后生成的列表中的项目，并将每个项目添加到字典中，如果该项目不存在于该dictiornary中，则计为1。如果该项目已存在，则会将其相应的值增加1。单词就像一个键，计数就像一个值。

def f(x, n):
    words = x.split()
    d = {}
    for word in words:
        if word in d:
            d[word] += 1
        else:
            d[word] = 1     
    print [i for i,j in d.items() if d[i] >= n]        

f("the apple the banana the apple", 2)

输出：

['the', 'apple']

Answer 2

您提供的功能中的问题是您实际上正在检查每个单词的length（通过执行if len(word)...），而不是检查字符串中的频率。

你可以简单地使用collections.Counter和list comprehension这样：

from collections import Counter

def f(string, n):
    count = Counter(string.split()).items()
    return [i for (i, j) in count if j >= n]

print(f("the apple the banana the apple", 2))

输出：

['apple', 'the']

Answer 3

这是一个有效的解决方案
由于您从头开始提到＆＃39;，我将编写此代码而不导入任何模块。
的
逻辑：
1。迭代单词列表（仅一次） {O（n）complexity} 并使用字典计算出现次数。字典是理想的，因为你不能有重复的字样
2。在 {O（n）复杂度} 后迭代字典，并检查该值是否大于N - >;如果是这样，请将其附加到将返回的列表中（如果尚未在列表中）。

def N_duplicates(string, freq): #Get's input string and Frequency word_count={} #Dictionary is used to store word frequencies. ret_lst=[] #returning list lst=string.split(); for word in lst: if (word not in word_count): word_count[word]=1 else: word_count[word]=word_count[word]+1; for item in word_count.keys(): if (word_count[item]>=freq): if (item not in ret_lst): ret_lst.append(item) return ret_lst; print(N_duplicates("the kid jumped off the roof",1))

Answer 4

Counter是你的朋友。尝试这样的事情：

from collections import Counter

def f(x, n):
   words = x.split()
   c = Counter(words)
   return [word for word, v in c.items() if v >= n]

然后：

>>> print(f("the kid jumped off the roof", 1))
['the', 'kid', 'off', 'roof', 'jumped']

Answer 5

您可以使用字符串的count和set内置函数来实现此目的：

>>> def f(x, n):
...     return sorted(set(s for s in x.split() if x.count(s) >= n))
... 
>>> s1 = "the apple the banana the apple"
>>> s2 = "the kid jumped off the roof"
>>> f(s1, 2)
['apple', 'the']
>>> f(s2, 1)
['jumped', 'kid', 'off', 'roof', 'the']

Answer 6

我喜欢这里提供的答案，但我很想测试Counter()和list()变体之间的结果差异，所以我实现了两个函数，所以它们返回一个带有单词的排序数组和计数的数量，所以我可以更好地比较结果：

from collections import Counter
# this is the Counter version returned sorted
def f(x,n): return sorted(["%s:%s" % (w,c) for w,c in Counter(x.split()).most_common() if c >= n])
# this is the list version returned sorted
def g(x, n): return sorted(list(set("%s:%s" % (s, x.count(s)) for s in x.split() if x.count(s) >= n)))

现在我用两个Lorem Ipsum文字的单词输入了两个函数。我感到很惊讶，实际上存在差异。

两个版本的共同点是，他们不会考虑标点符号。因此，如果我在文本中有Apple，则它与Apple,或Apple.或Apple!不同，依此类推。您可以在计算单词之前轻松替换/删除所有标点符号。

同样Apple与apple不同，可能与预期相同，但如果不是，则您还需要.lower()字符串。

但最大的区别在于计数本身。实际上，list()版本失败了，因为它会对单词进行计数，如果它们出现在它接缝的另一个单词中。所以函数f()计算了at 8次，这是正确的，但函数g()显示了38个计数 - 显然x.count(s)不仅返回单词，还返回子匹配。太糟糕了，这导致list()版本失败。

用尴尬的句子试着这个结果给出了以下结果：

>>> print f("This test so nice, is like ice! Test... likely;",1)
['Test...:1', 'This:1', 'ice!:1', 'is:1', 'like:1', 'likely;:1', 'nice,:1', 'so:1', 'test:1']

>>> print g("This test so nice, is like ice! Test... likely;",1)
['Test...:1', 'This:1', 'ice!:1', 'is:2', 'like:2', 'likely;:1', 'nice,:1', 'so:1', 'test:1']

在这里，您可以看到行为，并且list()版本实际上计算和喜欢两次，因为它们包含在 This < / em>和可能。

所以获胜者是：

from collections import Counter # this is the Counter version but result returned sorted def f(x,n): return sorted([w for w,c in Counter(x.split()).most_common() if c >= n])

现在这仍然没有考虑大小写和标点符号。如果你想要一个结果，正如我所期望的那样，你可以添加string模块来给你预期的结果：

from collections import Counter import string # return correct result lowercase without punctuation and sorted def f(x,n): return sorted([w for w,c in Counter(x.translate(None, string.punctuation).lower().split()).most_common() if c >= n])

.translate(None, string.punctuation).lower()在这里完成了所有的魔法，结果是：

>>> print f("This test so nice, is like ice! Test... likely;",1) ['ice', 'is', 'like', 'likely', 'nice', 'so', 'test', 'this']

伙计们，我喜欢一个班轮函数:)但是如果python初学者在这里问，我们不应该过多地关注我们的偏好，而是关注一个能够很好地洞察python的代码以及为什么事情就像他们那样做所以关于选择的答案，初学者可读！

从python 3中出现多次或多次的字符串返回单词？

6 个答案: