Haskell中的频率计数

时间:2015-06-16 01:26:11

标签: haskell dictionary count frequency

所以我试图完成的任务是从字符串s中获取所有可能的长度为n的模式的频率列表。

Input (Text,Length) 
Output (Frequency)
String -> Int -> [Int]
freqCount s n = frequency list [int] in alphabetical order

[" A"" B"" C"" d"]

其中s中的字符仅限于上面提到的四个,所以我认为第一步是在允许重复的情况下获得长度为n的所有可能的排列。

permutationsR k = sort(replicateM k ['A','B','C','D'])

例如。

permutationsR 2 

会给出输出

 ["AA","AB","AC","AD","BA","BB","BC","BD","CA","CB","CC","CD","DA","DB","DC","DD"]

然后为每个模式计算每次发生的次数 像

这样的东西
patternCount:: String -> String -> Int
patternCount text pattern = length (filter (\x -> x == pattern) [take (length(pattern)) (drop x text)| x <- [0..length(text)- length(pattern)]])
frequencyCount s n = map (\x -> patternCount s x) (permutationsR n)

然而,我认为这将是非常低效的,因为我基本上通过整个列表来检查每个模式长度(permutationsR n)次,而不是我的理由应该只在一次迭代中做。

有没有办法按照命令式语言生成频率图。

即。在伪代码中

where s = string
and n = length of pattern
//pattern is a map where key = pattern and value = frequencyCount

patterns = {"AA":0,"AB":0,"AC:0...}
for (i = 0; i < (length s - n); i++){
    patterns[s[i:(i+n)]] += 1
}

基本上只迭代一次,从(i:i + n)分割字符串 并在每次出现时更新模式图。

示例输入,输出将是这样的

s= "AABBCA"
n = 2
frequencyList s n = [1,1,0,0,0,1,1,0,1,0,0,0,0,0,0,0]

1 个答案:

答案 0 :(得分:3)

这是一个可能的解决方案:

import Data.List

groupsOf n str =
  unfoldr (\s ->
            if length s < n
            then Nothing
            else Just (take n s, tail s)) str

frequency :: (Ord t, Eq t) => [t] -> [(t, Int)]
frequency =
  map (\s -> (head s, length s)) . group . sort

groupsOf将输入字符串拆分为长度为n的重叠序列。例如,

groupsOf 3 "AABCABC"

会给出

["AAB", "ABC", "BCA", "CAB", "ABC"].  

frequency然后会计算每个子序列的出现次数,所以

frequency $ groupsOf 3 "AABCABC"

应该给出

[("AAB", 1), ("ABC", 2), ("BCA", 1), ("CAB", 1)].

结果中未出现的任何子序列都发生了零次。