加入或合并haskell中的函数

时间:2014-06-26 07:08:35

标签: haskell

Haskell中是否有一个函数可以进行SQL连接或R合并?

基本上我有两个元组列表,并希望根据键来“压缩”它们。 我知道每个键只有一个值或零值

a = [(1, "hello"), (2, "world")]
b = [(3, "foo"), (1, "bar")]

得到像

这样的东西
 [ (Just (1, "hello), Just (1, "bar))
 , (Just (2, "world), Nothing)
 , (Nothing         , Just (3, "foo"))
 ]

5 个答案:

答案 0 :(得分:3)

使用有序集(列表)Ord key

key = fst

fullOuterJoin xs [] = map (\x -> (Just x, Nothing)) xs
fullOuterJoin [] ys = map (\y -> (Nothing, Just y)) ys
fullOuterJoin xss@(x:xs) yss@(y:ys) =
  if key x == key y
    then (Just x, Just y): fullOuterJoin xs ys
    else
      if key x < key y
        then (Just x, Nothing): fullOuterJoin xs yss
        else (Nothing, Just y): fullOuterJoin xss ys

(复杂性为O(n+m),但如果必须排序则为O(n log n + m log m)

例如

setA = [(1, "hello"), (2, "world")]
setB = [(1, "bar"), (3, "foo")]

*Main> fullOuterJoin setA setB
[(Just (1,"hello"),Just (1,"bar")),(Just (2,"world"),Nothing),(Nothing,Just (3, "foo"))]

(显然支持sort

fullOuterJoin' xs ys = fullOuterJoin (sort xs) (sort ys)

正如@Franky所说,你可以避免使用if,例如

fullOuterJoin xs [] = [(Just  x, Nothing) | x <- xs]
fullOuterJoin [] ys = [(Nothing, Just  y) | y <- ys]
fullOuterJoin xss@(x:xs) yss@(y:ys) =
  case (compare `on` key) x y of
    EQ -> (Just  x, Just  y): fullOuterJoin xs  ys
    LT -> (Just  x, Nothing): fullOuterJoin xs  yss
    GT -> (Nothing, Just  y): fullOuterJoin xss ys

答案 1 :(得分:1)

我无法想到执行此操作的任何标准功能。我会将这两个列表转换为Data.Map.Map并自己编写SQL连接。通过这种方式,O(n log n)复杂度看起来很糟糕,这也不算太糟糕。

答案 2 :(得分:1)

如果您关心表现,这不是您正在寻找的答案。 由于你没有给出类型,因此内置函数没有答案。

可以通过简单的列表理解来完成

joinOnFst as bs = [(a,b) | a<-as, b<-bs, fst a == fst b]

或使用模式匹配和不同的返回类型

joinOnFst as bs = [(a1,a2,b2) | (a1,a2)<-as, (b1,b2)<-bs, a1==b1]

更抽象,你可以定义

listJoinBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,b)]
listJoinBy comp as bs =  [(a,b) | a<-as, b<-bs, comp a b]

listJoin :: (Eq c) => (a -> c) -> (b -> c) -> [a] -> [b] -> [(a,b)]
listJoin fa fb = listJoinBy (\a b -> fa a == fb b)

我打赌最后一行可以无点,或者至少可以消除lambda。

答案 3 :(得分:0)

你问的基本上是

ordzip a@(x:t) b@(y:r) = case compare x y of
    LT -> (Just  x, Nothing) : ordzip t b 
    EQ -> (Just  x, Just  y) : ordzip t r 
    GT -> (Nothing, Just  y) : ordzip a r 
ordzip a [] = [(Just x, Nothing) | x <- a]
ordzip [] b = [(Nothing, Just y) | y <- b]

有了它,我们可以进一步定义例如。

import Control.Applicative (<|>) 

diff  xs ys = [x | (Just x, Nothing) <- ordzip xs ys]    -- set difference
meet  xs ys = [y | (Just _, Just  y) <- ordzip xs ys]    -- intersection
union xs ys = [z | (u,v) <- ordzip xs ys, let Just z = u <|> v] 

或在访问密钥或处理重复项等方面几乎没有变化(如定义ordzipBy k a@(x:t) b@(y:r) = case compare (k x) (k y) of ...)。

但使用Data.These.These类型可以更好地表示(Nothing, Nothing)永远不会发生的事实:

import Data.These

ordzip a@(x:t) b@(y:r) = case compare x y of
    LT -> This  x   : ordzip t b 
    GT -> That    y : ordzip a r 
    _  -> These x y : ordzip t r 

diff  xs ys = catThis                $ ordzip xs ys
meet  xs ys = map snd . catThese     $ ordzip xs ys -- or map fst, or id
union xs ys = map (mergeThese const) $ ordzip xs ys 

当然,输入列表应事先按键排序。

答案 4 :(得分:0)

您要问的是,Haskell中是否可以进行类似SQL连接的操作。

Rosetta代码在Haskell中有一个如何执行hash join的示例,这是RDBMS用于JOIN的许多算法,其中一种算法表示为更简洁(尽管速度较慢):

  

更干净,更实用的解决方案是使用Data.Map(基于二叉树):

import qualified Data.Map as M
import Data.List
import Data.Maybe
import Control.Applicative

mapJoin xs fx ys fy = joined
  where yMap = foldl' f M.empty ys
        f m y = M.insertWith (++) (fy y) [y] m
        joined = concat .
                 mapMaybe (\x -> map (x,) <$> M.lookup (fx x) yMap) $ xs

main = mapM_ print $ mapJoin
    [(1, "Jonah"), (2, "Alan"), (3, "Glory"), (4, "Popeye")]
        snd
    [("Jonah", "Whales"), ("Jonah", "Spiders"), 
     ("Alan", "Ghosts"), ("Alan", "Zombies"), ("Glory", "Buffy")]
        fst