Haskell中是否有一个函数可以进行SQL连接或R合并?
基本上我有两个元组列表,并希望根据键来“压缩”它们。 我知道每个键只有一个值或零值
a = [(1, "hello"), (2, "world")]
b = [(3, "foo"), (1, "bar")]
得到像
这样的东西 [ (Just (1, "hello), Just (1, "bar))
, (Just (2, "world), Nothing)
, (Nothing , Just (3, "foo"))
]
答案 0 :(得分:3)
使用有序集(列表)和Ord key
key = fst
fullOuterJoin xs [] = map (\x -> (Just x, Nothing)) xs
fullOuterJoin [] ys = map (\y -> (Nothing, Just y)) ys
fullOuterJoin xss@(x:xs) yss@(y:ys) =
if key x == key y
then (Just x, Just y): fullOuterJoin xs ys
else
if key x < key y
then (Just x, Nothing): fullOuterJoin xs yss
else (Nothing, Just y): fullOuterJoin xss ys
(复杂性为O(n+m)
,但如果必须排序则为O(n log n + m log m)
)
例如
setA = [(1, "hello"), (2, "world")]
setB = [(1, "bar"), (3, "foo")]
*Main> fullOuterJoin setA setB
[(Just (1,"hello"),Just (1,"bar")),(Just (2,"world"),Nothing),(Nothing,Just (3, "foo"))]
(显然支持sort
fullOuterJoin' xs ys = fullOuterJoin (sort xs) (sort ys)
正如@Franky所说,你可以避免使用if
,例如
fullOuterJoin xs [] = [(Just x, Nothing) | x <- xs]
fullOuterJoin [] ys = [(Nothing, Just y) | y <- ys]
fullOuterJoin xss@(x:xs) yss@(y:ys) =
case (compare `on` key) x y of
EQ -> (Just x, Just y): fullOuterJoin xs ys
LT -> (Just x, Nothing): fullOuterJoin xs yss
GT -> (Nothing, Just y): fullOuterJoin xss ys
答案 1 :(得分:1)
我无法想到执行此操作的任何标准功能。我会将这两个列表转换为Data.Map.Map
并自己编写SQL连接。通过这种方式,O(n log n)复杂度看起来很糟糕,这也不算太糟糕。
答案 2 :(得分:1)
如果您关心表现,这不是您正在寻找的答案。 由于你没有给出类型,因此内置函数没有答案。
可以通过简单的列表理解来完成
joinOnFst as bs = [(a,b) | a<-as, b<-bs, fst a == fst b]
或使用模式匹配和不同的返回类型
joinOnFst as bs = [(a1,a2,b2) | (a1,a2)<-as, (b1,b2)<-bs, a1==b1]
更抽象,你可以定义
listJoinBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,b)]
listJoinBy comp as bs = [(a,b) | a<-as, b<-bs, comp a b]
listJoin :: (Eq c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]
listJoin fa fb = listJoinBy (\a b -> fa a == fb b)
我打赌最后一行可以无点,或者至少可以消除lambda。
答案 3 :(得分:0)
你问的基本上是
ordzip a@(x:t) b@(y:r) = case compare x y of
LT -> (Just x, Nothing) : ordzip t b
EQ -> (Just x, Just y) : ordzip t r
GT -> (Nothing, Just y) : ordzip a r
ordzip a [] = [(Just x, Nothing) | x <- a]
ordzip [] b = [(Nothing, Just y) | y <- b]
有了它,我们可以进一步定义例如。
import Control.Applicative (<|>)
diff xs ys = [x | (Just x, Nothing) <- ordzip xs ys] -- set difference
meet xs ys = [y | (Just _, Just y) <- ordzip xs ys] -- intersection
union xs ys = [z | (u,v) <- ordzip xs ys, let Just z = u <|> v]
或在访问密钥或处理重复项等方面几乎没有变化(如定义ordzipBy k a@(x:t) b@(y:r) = case compare (k x) (k y) of ...
)。
但使用Data.These.These
类型可以更好地表示(Nothing, Nothing)
永远不会发生的事实:
import Data.These
ordzip a@(x:t) b@(y:r) = case compare x y of
LT -> This x : ordzip t b
GT -> That y : ordzip a r
_ -> These x y : ordzip t r
diff xs ys = catThis $ ordzip xs ys
meet xs ys = map snd . catThese $ ordzip xs ys -- or map fst, or id
union xs ys = map (mergeThese const) $ ordzip xs ys
当然,输入列表应事先按键排序。
答案 4 :(得分:0)
您要问的是,Haskell中是否可以进行类似SQL连接的操作。
Rosetta代码在Haskell中有一个如何执行hash join的示例,这是RDBMS用于JOIN的许多算法,其中一种算法表示为更简洁(尽管速度较慢):
更干净,更实用的解决方案是使用Data.Map(基于二叉树):
import qualified Data.Map as M
import Data.List
import Data.Maybe
import Control.Applicative
mapJoin xs fx ys fy = joined
where yMap = foldl' f M.empty ys
f m y = M.insertWith (++) (fy y) [y] m
joined = concat .
mapMaybe (\x -> map (x,) <$> M.lookup (fx x) yMap) $ xs
main = mapM_ print $ mapJoin
[(1, "Jonah"), (2, "Alan"), (3, "Glory"), (4, "Popeye")]
snd
[("Jonah", "Whales"), ("Jonah", "Spiders"),
("Alan", "Ghosts"), ("Alan", "Zombies"), ("Glory", "Buffy")]
fst