Question

我有一个空列表，并且如果freq中的键的对应值等于m，则应该将任何键附加到列表词中。

我有输入ACGTTGCATGTCGCATGATGCATGAGAGCT（用于文本）和4（用于k）。

我需要做的是检查列表中的内容：

>>>> words = [ ]

上面的频率值等于最大值，如果是，则在其后面附加“关键字”。

我应该得到的输出是CATG GCAT。

如果您熟悉UCSD的基因组数据科学课程，您可能会知道此问题。

这是代码：

# Input:  A string Text and an integer k
# Output: A list containing all most frequent k-mers in Textdef 
>>>> FrequentWords(Text, k):
>>>> words = []
>>>> freq = FrequencyMap(Text, k)
>>>> m = max(freq.values())
>>>> for key in freq:
        # add each key to words whose corresponding frequency value is equal to m ( this is the part I am struggling with)
>>>> return words

Answer 1

如果您要查找的函数是输入字符串输入INPUT，最小频率m并返回该字符串中频率高于m的每个字符，那么您就可以了。

>>> def FrequentWords(INPUT, m):
...     counted = collections.Counter(INPUT)
...     payload = []
...     for i in counted:
...         letter_count = counted[i]
...         if letter_count > m:
...             payload.append(i)
...     return payload

Answer 2

Python提供了一些好的功能来支持这些常见的操作。 Counter类型（一种特殊的字典）将为您提供频率；一个简单的过滤器将帮助您返回列表。

从馆藏进口柜台

def FrequentWords(Text, k):
    # Build a dict of frequencies in the input
    freq = collections.Counter(Text)

    # Build a list of words whose frequencies are at least the given threshold, k
    words = [word for word in freq if freq[word] >= k]

    return words

如果Text是要计数的事物的可迭代项（字符串，字符串列表，元组等），则此方法有效。如果它是一个包含整个段落的大字符串（字母序列，而不是分成单词），那么您将需要从中提取单词，例如

word_list = Text.split()

...，然后在word_list

上操作

Python空列表中的常用词

2 个答案: