如何使用Haskell对列表中的类似项进行分组?

时间:2012-09-13 02:04:04

标签: haskell

给出这样的元组列表:

dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]

如何对 dic 的项目进行分组,从而产生一个列表 grp ,其中,

grp  = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]

我实际上是Haskell的新人......似乎爱上了它。
Data.List 中使用 groupBy 只会对列表中相似的相邻项进行分组。 我为此编写了一个低效的函数,但由于我需要处理一个非常大的编码字符串列表,因此会导致内存故障。希望你能帮我找到更有效的方法。

5 个答案:

答案 0 :(得分:55)

尽可能重用库代码。

import Data.Map
sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]

在ghci中尝试:

*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]

答案 1 :(得分:15)

这是我的解决方案:

import Data.Function (on)
import Data.List (sortBy, groupBy)
import Data.Ord (comparing)

myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])]
myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)
          . sortBy (comparing fst)

首先使用sortBy

对列表进行排序
[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]     
=> [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]

然后使用groupBy

按关联键对列表元素进行分组
[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")] 
=> [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]

然后使用map将分组的项目转换为元组:

[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]] 
=> [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)

测试:

> myGroup dic
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

答案 2 :(得分:5)

您也可以使用TransformListComp扩展名,例如:

Prelude> :set -XTransformListComp 
Prelude> import GHC.Exts (groupWith, the)
Prelude GHC.Exts> let dic = [ (1, "aa"), (1, "bb"), (1, "cc") , (2, "aa"), (3, "ff"), (3, "gg")]
Prelude GHC.Exts> [(the key, value) | (key, value) <- dic, then group by key using groupWith]
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

答案 3 :(得分:4)

  1. 如果列表没有在第一个元素上排序,我认为你不能比O(nlog(n))做得更好。

    • 一种简单的方法是sort,然后使用第二部分答案中的任何内容。

    • 您可以使用Data.Map这样的地图Map k [a]来使用元组的第一个元素作为键,并继续添加值。

    • 您可以编写自己的复杂函数,即使您完成所有尝试仍然需要O(nlog(n))。

  2. 如果list在第一个元素上排序,就像你的例子中的情况那样,那么对于像@Mikhail的答案中给出的groupBy或者使用foldr那样的任务是微不足道的,还有很多其他方法。 / p>

  3. 使用foldr的一个例子是:

      grp :: Eq a => [(a,b)] -> [(a,[b])]
      grp = foldr f []
         where 
           f (z,s) [] = [(z,[s])] 
           f (z,s) a@((x,y):xs)  | x == z = (x,s:y):xs 
                                 | otherwise = (z,[s]):a
    

答案 4 :(得分:0)

{-# LANGUAGE TransformListComp #-}

import GHC.Exts
import Data.List
import Data.Function (on)

process :: [(Integer, String)] -> [(Integer, [String])]
process list = [(the a, b) |  let info = [ (x, y) | (x, y) <- list, then    sortWith by y ], (a, b) <- info, then group by a using groupWith]