所以我试图完成的任务是从字符串s中获取所有可能的长度为n的模式的频率列表。
Input (Text,Length)
Output (Frequency)
String -> Int -> [Int]
freqCount s n = frequency list [int] in alphabetical order
[" A"" B"" C"" d"]
其中s中的字符仅限于上面提到的四个,所以我认为第一步是在允许重复的情况下获得长度为n的所有可能的排列。
permutationsR k = sort(replicateM k ['A','B','C','D'])
例如。
permutationsR 2
会给出输出
["AA","AB","AC","AD","BA","BB","BC","BD","CA","CB","CC","CD","DA","DB","DC","DD"]
然后为每个模式计算每次发生的次数 像
这样的东西patternCount:: String -> String -> Int
patternCount text pattern = length (filter (\x -> x == pattern) [take (length(pattern)) (drop x text)| x <- [0..length(text)- length(pattern)]])
frequencyCount s n = map (\x -> patternCount s x) (permutationsR n)
然而,我认为这将是非常低效的,因为我基本上通过整个列表来检查每个模式长度(permutationsR n)次,而不是我的理由应该只在一次迭代中做。
有没有办法按照命令式语言生成频率图。
即。在伪代码中
where s = string
and n = length of pattern
//pattern is a map where key = pattern and value = frequencyCount
patterns = {"AA":0,"AB":0,"AC:0...}
for (i = 0; i < (length s - n); i++){
patterns[s[i:(i+n)]] += 1
}
基本上只迭代一次,从(i:i + n)分割字符串 并在每次出现时更新模式图。
示例输入,输出将是这样的
s= "AABBCA"
n = 2
frequencyList s n = [1,1,0,0,0,1,1,0,1,0,0,0,0,0,0,0]
答案 0 :(得分:3)
这是一个可能的解决方案:
import Data.List
groupsOf n str =
unfoldr (\s ->
if length s < n
then Nothing
else Just (take n s, tail s)) str
frequency :: (Ord t, Eq t) => [t] -> [(t, Int)]
frequency =
map (\s -> (head s, length s)) . group . sort
groupsOf
将输入字符串拆分为长度为n的重叠序列。例如,
groupsOf 3 "AABCABC"
会给出
["AAB", "ABC", "BCA", "CAB", "ABC"].
frequency
然后会计算每个子序列的出现次数,所以
frequency $ groupsOf 3 "AABCABC"
应该给出
[("AAB", 1), ("ABC", 2), ("BCA", 1), ("CAB", 1)].
结果中未出现的任何子序列都发生了零次。