Haskell哈希表

时间:2011-11-01 17:34:02

标签: haskell

我正在尝试构建一个小型的haskell应用程序,它将把几个关键短语从英语翻译成法语。

首先,我有一个有序的字符串对列表,代表英语单词/短语,然后是法语翻译:

icards = [("the", "le"),("savage", "violent"),("work", "travail"),
("wild", "sauvage"),("chance", "occasion"),("than a", "qu'un")...]

接下来我有一个新数据:

data Entry = Entry {wrd, def :: String, len :: Int, phr :: Bool}
deriving Show

然后我使用icards填充Entrys列表:

entries :: [Entry]
entries = map (\(x, y) -> Entry x y (length x) (' ' `elem` x)) icards

为简单起见,我创建了一个名为Run的[Entry]的新类型。

现在,我想根据英文单词中的字符数创建一个哈希表。稍后将使用它来加速搜索。所以我想创建一个名为runs的函数:

runs :: [Run]
runs = --This will run through the entries and return a new [Entry] that has all of the
         words of the same length grouped together.

我也有:

maxl = maximum [len e | e <- entries]

2 个答案:

答案 0 :(得分:8)

Hackage有一个hashmap包!我将基于HashMap创建一个小数据类型,我将其称为MultiMap。这是一个典型的技巧:它只是链接列表的哈希映射。我不确定MultiMap的正确名称是什么。

import qualified Data.HashMap as HM
import Data.Hashable

import Prelude hiding (lookup)

type MultiMap k v = HM.Map k [v]

insert :: (Hashable k, Ord k) => k -> a -> MultiMap k a -> MultiMap k a
insert k v = HM.insertWith (++) k [v]

lookup :: (Hashable k, Ord k) => k -> MultiMap k a -> [a]
lookup k m = case HM.lookup k m of
  Nothing -> []
  Just xs -> xs

empty :: MultiMap k a
empty = HM.empty

fromList :: (Hashable k, Ord k) => [(k,v)] -> MultiMap k v
fromList = foldr (uncurry insert) empty

我只模仿了Map的基本要素:insert,lookup,empty和fromList。现在可以很容易地将entries变成MutliMap

data Entry = Entry {wrd, def :: String, len :: Int, phr :: Bool}
           deriving (Show)

icards = [("the", "le"),("savage", "violent"),("work", "travail"),
          ("wild", "sauvage"),("chance", "occasion"),("than a", "qu'un")]

entries :: [Entry]
entries = map (\(x, y) -> Entry x y (length x) (' ' `elem` x)) icards

fromEntryList :: [Entry] -> MutiMap Int Entry
fromEntryList es = fromList $ map (\e -> (len e, e)) es

将其加载到ghci中,我们现在可以查找具有给定长度的条目列表:

ghci> let m = fromEntryList entries
ghci> lookup 3 m
[Entry {wrd = "the", def = "le", len = 3, phr = False}]
ghci> lookup 4 m
[Entry {wrd = "work", def = "travail", len = 4, phr = False},
 Entry {wrd = "wild", def = "sauvage", len = 4, phr = False}]

(请注意,此lookup不是Prelude中定义的那个。)您可以类似地将英语单词用作键。

-- import Data.List (find) -- up with other imports

fromEntryList' :: [Entry] -> MultiMap String Entry
fromEntryList' es = fromList $ map (\e -> (wrd e, e)) es

eLookup :: String -> MultiMap String Entry -> Maybe Entry
eLookup str m = case lookup str m of
  [] -> Nothing
  xs -> find (\e -> wrd e == str) xs

...测试

ghci> let m = fromEntryList' entries
ghci> eLookup "the" m
Just (Entry {wrd = "the", def = "le", len = 3, phr = False})
ghci> eLookup "foo" m
Nothing

注意eLookup我们如何首先执行Map查找以确定是否有任何内容放在该槽中。由于我们使用哈希集,我们需要记住两个不同的字符串可能具有相同的哈希码。因此,如果插槽不为空,我们会在链接列表上执行find,以查看其中的任何条目是否与正确的英语单词匹配。如果您对效果感兴趣,则应考虑使用Data.Text代替String

答案 1 :(得分:4)

groupBysortBy都在Data.List

import Data.List
import Data.Function -- for `on`
runs :: [Run]
runs = f 0 $ groupBy ((==) `on` len) $ sortBy (compare `on` len) entries
  where f _                              []           =      []
        f i (r @ (Entry {len = l} : _) : rs) | i == l = r  : f (i + 1) rs
        f i                              rs           = [] : f (i + 1) rs

就个人而言,我会使用地图

import qualified Data.Map as M
runs :: M.Map String Entry
runs = M.fromList $ map (\entry -> (wrd entry, entry)) entries

直接用英语单词查找,而不是两步英语单词,然后是英语单词过程。