我编写了下面的代码来列出列表列表的子序列频率(结果包括子序列和子序列出现的列表的索引)。有没有人有任何建议如何使其更简洁和/或更有效?
示例输出:
*主> combFreq [[1,2,3,5,7,8],[2,3,5,6,7],[3,5,7,9],[1,2,3,7,9], [3,5,7,10]]
[([3,5],[0,1,2,4]),([2,3],[0,1,3]),([3,5,7],[0,2,4 ]),([5,7],[0,2,4]),([2,3,5],[0,1]),([1,2],[0,3]),( [1,2,3],[0,3]),([7,9],[2,3])]
import Data.List
import Data.Function (on)
--[[1,2,3,5,7,8],[2,3,5,6,7],[3,5,7,9],[1,2,3,7,9],[3,5,7,10]]
tupleCat x y = (fst x, sort $ nub $ snd x ++ snd y)
isInResult x result = case lookup x result of
Just a -> [a]
Nothing -> []
sInt xs = concat $ sInt' (csubs xs) 0 (length xs) where
csubs = map (filter (not . null) . concatMap inits . tails)
sInt' [] _ _ = []
sInt' (x:xs) count origLen =
let result = (zip (zip (replicate (length xs) count) [count+1..origLen])
$ map (\y -> intersect x y) xs)
in concatMap (\x -> let a = fst x in map (\y -> (y,a)) (snd x))
result : sInt' xs (count + 1) origLen
concatResults [] result = result
concatResults (x:xs) result =
let match = isInResult (fst x) result
newX = (fst x, [fst $ snd x, snd $ snd x])
in if not (null match)
then let match' = (fst x, head match)
newResult = deleteBy (\x -> (==match')) match' result
in concatResults xs (tupleCat match' newX : newResult)
else concatResults xs (newX : result)
combFreq xs =
filter (\x -> length (fst x) > 1)
$ reverse $ sortBy (compare `on` (length . snd)) $ concatResults (sInt xs) []
答案 0 :(得分:2)
以下是我将如何去做。我没有比较性能,
这当然是天真的。它列举了所有连续的子序列
每个列表并将它们收集到Map
。它应该符合你的要求
但更简洁。
import Data.List as L
import Data.Map (Map)
import qualified Data.Map as M
nonEmptySubs :: [a] -> [[a]]
nonEmptySubs = filter (not . null)
. concatMap tails
. inits
makePairs :: (a -> [a]) -> [a] -> [(a, Int)]
makePairs f xs = concat $ zipWith app xs [0 .. ]
where app y i = zip (f y) (repeat i)
results :: (Ord a) => [[a]] -> Map [a] [Int]
results =
let ins acc (seq, ind) = M.insertWith (++) seq [ind] acc
-- Insert the index at the given sequence as a singleton list
in foldl' ins M.empty . makePairs nonEmptySubs
combFreq :: (Ord a) => [[a]] -> [([a], [Int])]
combFreq = filter (not . null . drop 1 . snd) -- Keep subseqs with more than 1 match
. filter (not . null . drop 1 . fst) -- keep subseqs longer than 1
. M.toList
. results
请注意,此版本将提供相同的定性结果,但它会 没有相同的顺序。
我最大的建议是更多地分解并利用你的东西 可以从一些标准库中做一些繁琐的工作。请注意我们 可以把很多工作分解成不同的阶段然后组成 获得最终功能的阶段。
答案 1 :(得分:0)
如果你的所有名单都在增加(就像他们在你的例子中一样),那么下面应该有效(不是美女,因为我是Haskell-newbie;非常欢迎关于如何改进的评论):
import Control.Arrow (first, second)
compFreq ls = cF [] [] ls
where cF rs cs ls | all null ls = rs
| otherwise = cF (rs++rs') (cs'' ++ c ++ cs') ls'
where m = minimum $ map head $ filter (not . null) ls
ls' = map (\l -> if null l || m < head l then l
else tail l) ls
is = map snd $ filter ((==m) . head . fst) $ filter (not . null . fst) $ zip ls [0,1..]
c = if atLeastTwo is then [([m], is)] else []
fs = filter (\(vs, is') -> atLeastTwo $ combine is is') cs
cs' = map (\(vs, is') -> (vs++[m], combine is is')) fs
cs'' = map (second (filter (not . (`elem` is)))) cs
rs' = filter ok cs'
combine _ [] = []
combine [] _ = []
combine (i:is) (i':is') | i<i' = combine is (i':is')
| i>i' = combine (i:is) is'
| i==i' = i:combine is is'
atLeastTwo = not . null . drop 1
ok (js, ts) = atLeastTwo js && atLeastTwo ts
这个想法是通过始终查看最小值m来处理列表,该值从所有列表中删除以获得ls'。索引列表告诉m被删除的位置。内部工作函数cF有两个额外的参数:到目前为止的结果列表rs和当前子序列的列表cs。如果最小值出现至少两次,则该最小值开始新的子序列c。 cs'是以m结尾的子序列,cs''是没有m的那些。新结果rs'都包含m作为最后一个元素。
您的示例的输出是
[([1,2],[0,3]),([2,3],[0,1,3]),([1,2,3],[0,3]),([3,5],[0,1,2,4]),([2,3,5],[0,1]),([5,7],[0,2,4]),([3,5,7],[0,2,4]),([7,9],[2,3])]