我正在尝试构建一个小型的haskell应用程序,它将把几个关键短语从英语翻译成法语。
首先,我有一个有序的字符串对列表,代表英语单词/短语,然后是法语翻译:
icards = [("the", "le"),("savage", "violent"),("work", "travail"),
("wild", "sauvage"),("chance", "occasion"),("than a", "qu'un")...]
接下来我有一个新数据:
data Entry = Entry {wrd, def :: String, len :: Int, phr :: Bool}
deriving Show
然后我使用icards填充Entrys列表:
entries :: [Entry]
entries = map (\(x, y) -> Entry x y (length x) (' ' `elem` x)) icards
为简单起见,我创建了一个名为Run的[Entry]的新类型。
现在,我想根据英文单词中的字符数创建一个哈希表。稍后将使用它来加速搜索。所以我想创建一个名为runs的函数:
runs :: [Run]
runs = --This will run through the entries and return a new [Entry] that has all of the
words of the same length grouped together.
我也有:
maxl = maximum [len e | e <- entries]
答案 0 :(得分:8)
Hackage有一个hashmap包!我将基于HashMap创建一个小数据类型,我将其称为MultiMap。这是一个典型的技巧:它只是链接列表的哈希映射。我不确定MultiMap的正确名称是什么。
import qualified Data.HashMap as HM
import Data.Hashable
import Prelude hiding (lookup)
type MultiMap k v = HM.Map k [v]
insert :: (Hashable k, Ord k) => k -> a -> MultiMap k a -> MultiMap k a
insert k v = HM.insertWith (++) k [v]
lookup :: (Hashable k, Ord k) => k -> MultiMap k a -> [a]
lookup k m = case HM.lookup k m of
Nothing -> []
Just xs -> xs
empty :: MultiMap k a
empty = HM.empty
fromList :: (Hashable k, Ord k) => [(k,v)] -> MultiMap k v
fromList = foldr (uncurry insert) empty
我只模仿了Map的基本要素:insert,lookup,empty和fromList。现在可以很容易地将entries
变成MutliMap
:
data Entry = Entry {wrd, def :: String, len :: Int, phr :: Bool}
deriving (Show)
icards = [("the", "le"),("savage", "violent"),("work", "travail"),
("wild", "sauvage"),("chance", "occasion"),("than a", "qu'un")]
entries :: [Entry]
entries = map (\(x, y) -> Entry x y (length x) (' ' `elem` x)) icards
fromEntryList :: [Entry] -> MutiMap Int Entry
fromEntryList es = fromList $ map (\e -> (len e, e)) es
将其加载到ghci中,我们现在可以查找具有给定长度的条目列表:
ghci> let m = fromEntryList entries
ghci> lookup 3 m
[Entry {wrd = "the", def = "le", len = 3, phr = False}]
ghci> lookup 4 m
[Entry {wrd = "work", def = "travail", len = 4, phr = False},
Entry {wrd = "wild", def = "sauvage", len = 4, phr = False}]
(请注意,此lookup
不是Prelude中定义的那个。)您可以类似地将英语单词用作键。
-- import Data.List (find) -- up with other imports
fromEntryList' :: [Entry] -> MultiMap String Entry
fromEntryList' es = fromList $ map (\e -> (wrd e, e)) es
eLookup :: String -> MultiMap String Entry -> Maybe Entry
eLookup str m = case lookup str m of
[] -> Nothing
xs -> find (\e -> wrd e == str) xs
...测试
ghci> let m = fromEntryList' entries
ghci> eLookup "the" m
Just (Entry {wrd = "the", def = "le", len = 3, phr = False})
ghci> eLookup "foo" m
Nothing
注意eLookup
我们如何首先执行Map查找以确定是否有任何内容放在该槽中。由于我们使用哈希集,我们需要记住两个不同的字符串可能具有相同的哈希码。因此,如果插槽不为空,我们会在链接列表上执行find
,以查看其中的任何条目是否与正确的英语单词匹配。如果您对效果感兴趣,则应考虑使用Data.Text
代替String
。
答案 1 :(得分:4)
groupBy
和sortBy
都在Data.List
。
import Data.List
import Data.Function -- for `on`
runs :: [Run]
runs = f 0 $ groupBy ((==) `on` len) $ sortBy (compare `on` len) entries
where f _ [] = []
f i (r @ (Entry {len = l} : _) : rs) | i == l = r : f (i + 1) rs
f i rs = [] : f (i + 1) rs
就个人而言,我会使用地图
import qualified Data.Map as M
runs :: M.Map String Entry
runs = M.fromList $ map (\entry -> (wrd entry, entry)) entries
直接用英语单词查找,而不是两步英语单词,然后是英语单词过程。