Question

Haskell中是否有一个函数可以进行SQL连接或R合并？

基本上我有两个元组列表，并希望根据键来“压缩”它们。我知道每个键只有一个值或零值

a = [(1, "hello"), (2, "world")]
b = [(3, "foo"), (1, "bar")]

得到像

这样的东西

 [ (Just (1, "hello), Just (1, "bar))
 , (Just (2, "world), Nothing)
 , (Nothing         , Just (3, "foo"))
 ]

Answer 1

使用有序集（列表）和Ord key

key = fst

fullOuterJoin xs [] = map (\x -> (Just x, Nothing)) xs
fullOuterJoin [] ys = map (\y -> (Nothing, Just y)) ys
fullOuterJoin xss@(x:xs) yss@(y:ys) =
  if key x == key y
    then (Just x, Just y): fullOuterJoin xs ys
    else
      if key x < key y
        then (Just x, Nothing): fullOuterJoin xs yss
        else (Nothing, Just y): fullOuterJoin xss ys

（复杂性为O(n+m)，但如果必须排序则为O(n log n + m log m)）

例如

setA = [(1, "hello"), (2, "world")]
setB = [(1, "bar"), (3, "foo")]

*Main> fullOuterJoin setA setB
[(Just (1,"hello"),Just (1,"bar")),(Just (2,"world"),Nothing),(Nothing,Just (3, "foo"))]

（显然支持sort

fullOuterJoin' xs ys = fullOuterJoin (sort xs) (sort ys)

正如@Franky所说，你可以避免使用if，例如

fullOuterJoin xs [] = [(Just  x, Nothing) | x <- xs]
fullOuterJoin [] ys = [(Nothing, Just  y) | y <- ys]
fullOuterJoin xss@(x:xs) yss@(y:ys) =
  case (compare `on` key) x y of
    EQ -> (Just  x, Just  y): fullOuterJoin xs  ys
    LT -> (Just  x, Nothing): fullOuterJoin xs  yss
    GT -> (Nothing, Just  y): fullOuterJoin xss ys

Answer 2

我无法想到执行此操作的任何标准功能。我会将这两个列表转换为Data.Map.Map并自己编写SQL连接。通过这种方式，O（n log n）复杂度看起来很糟糕，这也不算太糟糕。

Answer 3

如果您关心表现，这不是您正在寻找的答案。由于你没有给出类型，因此内置函数没有答案。

可以通过简单的列表理解来完成

joinOnFst as bs = [(a,b) | a<-as, b<-bs, fst a == fst b]

或使用模式匹配和不同的返回类型

joinOnFst as bs = [(a1,a2,b2) | (a1,a2)<-as, (b1,b2)<-bs, a1==b1]

更抽象，你可以定义

listJoinBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,b)]
listJoinBy comp as bs =  [(a,b) | a<-as, b<-bs, comp a b]

listJoin :: (Eq c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]
listJoin fa fb = listJoinBy (\a b -> fa a == fb b)

我打赌最后一行可以无点，或者至少可以消除lambda。

Answer 4

你问的基本上是

ordzip a@(x:t) b@(y:r) = case compare x y of
    LT -> (Just  x, Nothing) : ordzip t b 
    EQ -> (Just  x, Just  y) : ordzip t r 
    GT -> (Nothing, Just  y) : ordzip a r 
ordzip a [] = [(Just x, Nothing) | x <- a]
ordzip [] b = [(Nothing, Just y) | y <- b]

有了它，我们可以进一步定义例如。

import Control.Applicative (<|>) 

diff  xs ys = [x | (Just x, Nothing) <- ordzip xs ys]    -- set difference
meet  xs ys = [y | (Just _, Just  y) <- ordzip xs ys]    -- intersection
union xs ys = [z | (u,v) <- ordzip xs ys, let Just z = u <|> v]

或在访问密钥或处理重复项等方面几乎没有变化（如定义ordzipBy k a@(x:t) b@(y:r) = case compare (k x) (k y) of ...）。

但使用Data.These.These类型可以更好地表示(Nothing, Nothing)永远不会发生的事实：

import Data.These

ordzip a@(x:t) b@(y:r) = case compare x y of
    LT -> This  x   : ordzip t b 
    GT -> That    y : ordzip a r 
    _  -> These x y : ordzip t r 

diff  xs ys = catThis                $ ordzip xs ys
meet  xs ys = map snd . catThese     $ ordzip xs ys -- or map fst, or id
union xs ys = map (mergeThese const) $ ordzip xs ys

当然，输入列表应事先按键排序。

Answer 5

您要问的是，Haskell中是否可以进行类似SQL连接的操作。

Rosetta代码在Haskell中有一个如何执行hash join的示例，这是RDBMS用于JOIN的许多算法，其中一种算法表示为更简洁（尽管速度较慢）：

更干净，更实用的解决方案是使用Data.Map（基于二叉树）：

import qualified Data.Map as M
import Data.List
import Data.Maybe
import Control.Applicative

mapJoin xs fx ys fy = joined
  where yMap = foldl' f M.empty ys
        f m y = M.insertWith (++) (fy y) [y] m
        joined = concat .
                 mapMaybe (\x -> map (x,) <$> M.lookup (fx x) yMap) $ xs

main = mapM_ print $ mapJoin
    [(1, "Jonah"), (2, "Alan"), (3, "Glory"), (4, "Popeye")]
        snd
    [("Jonah", "Whales"), ("Jonah", "Spiders"), 
     ("Alan", "Ghosts"), ("Alan", "Zombies"), ("Glory", "Buffy")]
        fst

加入或合并haskell中的函数

5 个答案: