如何在Haskell中拆分字符串?

时间:2011-02-12 14:26:26

标签: string haskell

是否有一种在Haskell中拆分字符串的标准方法?

lineswords在拆分空格或换行符方面效果很好,但肯定有一种标准方法可以拆分逗号吗?

我在Hoogle上找不到它。

具体而言,我正在寻找split "," "my,comma,separated,list"返回["my","comma","separated","list"]的内容。

14 个答案:

答案 0 :(得分:152)

请记住,您可以查找Prelude功能的定义!

http://www.haskell.org/onlinereport/standard-prelude.html

在那里,words的定义是,

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                      "" -> []
                      s' -> w : words s''
                            where (w, s'') = break Char.isSpace s'

因此,对于采用谓词的函数更改它:

wordsWhen     :: (Char -> Bool) -> String -> [String]
wordsWhen p s =  case dropWhile p s of
                      "" -> []
                      s' -> w : wordsWhen p s''
                            where (w, s'') = break p s'

然后用你想要的任何谓词来调用它!

main = print $ wordsWhen (==',') "break,this,string,at,commas"

答案 1 :(得分:120)

有一个名为split的包。

cabal install split

像这样使用:

ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]

它有很多其他功能可以拆分匹配的分隔符或有几个分隔符。

答案 2 :(得分:26)

如果您使用Data.Text,则有splitOn:

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html#v:splitOn

这是在Haskell平台中构建的。

例如:

import qualified Data.Text as T
main = print $ T.splitOn (T.pack " ") (T.pack "this is a test")

或:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T
main = print $ T.splitOn " " "this is a test"

答案 3 :(得分:18)

在模块Text.Regex(Haskell平台的一部分)中,有一个函数:

splitRegex :: Regex -> String -> [String]

根据正则表达式拆分字符串。该API可在Hackage找到。

答案 4 :(得分:14)

使用Data.List.Split,其使用split

[me@localhost]$ ghci
Prelude> import Data.List.Split
Prelude Data.List.Split> let l = splitOn "," "1,2,3,4"
Prelude Data.List.Split> :t l
l :: [[Char]]
Prelude Data.List.Split> l
["1","2","3","4"]
Prelude Data.List.Split> let { convert :: [String] -> [Integer]; convert = map read }
Prelude Data.List.Split> let l2 = convert l
Prelude Data.List.Split> :t l2
l2 :: [Integer]
Prelude Data.List.Split> l2
[1,2,3,4]

答案 5 :(得分:12)

试试这个:

import Data.List (unfoldr)

separateBy :: Eq a => a -> [a] -> [[a]]
separateBy chr = unfoldr sep where
  sep [] = Nothing
  sep l  = Just . fmap (drop 1) . break (== chr) $ l

仅适用于单个字符,但应易于扩展。

答案 6 :(得分:9)

split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s

E.g。

split ';' "a;bb;ccc;;d"
> ["a","bb","ccc","","d"]

将删除单个尾随分隔符:

split ';' "a;bb;ccc;;d;"
> ["a","bb","ccc","","d"]

答案 7 :(得分:6)

我昨天开始学习Haskell,如果我错了,请纠正我,但是:

split :: Eq a => a -> [a] -> [[a]]
split x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if y==x then 
            func x ys ([]:(z:zs)) 
        else 
            func x ys ((y:z):zs)

给出:

*Main> split ' ' "this is a test"
["this","is","a","test"]

或者你想要

*Main> splitWithStr  " and " "this and is and a and test"
["this","is","a","test"]

将是:

splitWithStr :: Eq a => [a] -> [a] -> [[a]]
splitWithStr x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if (take (length x) (y:ys)) == x then
            func x (drop (length x) (y:ys)) ([]:(z:zs))
        else
            func x ys ((y:z):zs)

答案 8 :(得分:5)

我不知道如何在Steve的答案中添加评论,但我想推荐一下 GHC libraries documentation
特别是在那里 Sublist functions in Data.List

作为参考,这比阅读普通的Haskell报告要好得多。

通常情况下,有关何时创建新的子列表以供应的规则的折叠也应该解决它。

答案 9 :(得分:5)

我觉得这更容易理解:

split :: Char -> String -> [String]
split c xs = case break (==c) xs of 
  (ls, "") -> [ls]
  (ls, x:rs) -> ls : split c rs

答案 10 :(得分:3)

如果没有为一个空格直接替换一个字符,words的目标分隔符就是一个空格。类似的东西:

words [if c == ',' then ' ' else c|c <- "my,comma,separated,list"]

words let f ',' = ' '; f c = c in map f "my,comma,separated,list"

您可以将其转换为带参数的函数。您可以删除参数 character-to-match 我匹配的多个参数,例如:

 [if elem c ";,.:-+@!$#?" then ' ' else c|c <-"my,comma;separated!list"]

答案 11 :(得分:2)

除了答案中给出的高效和预先构建的函数之外,我还将添加我自己的函数,这些函数只是我编写的Haskell函数库的一部分,用于在我自己的时间学习语言:

-- Correct but inefficient implementation
wordsBy :: String -> Char -> [String]
wordsBy s c = reverse (go s []) where
    go s' ws = case (dropWhile (\c' -> c' == c) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> c' /= c) rem)) ((takeWhile (\c' -> c' /= c) rem) : ws)

-- Breaks up by predicate function to allow for more complex conditions (\c -> c == ',' || c == ';')
wordsByF :: String -> (Char -> Bool) -> [String]
wordsByF s f = reverse (go s []) where
    go s' ws = case ((dropWhile (\c' -> f c')) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> (f c') == False)) rem) (((takeWhile (\c' -> (f c') == False)) rem) : ws)

解决方案至少是尾递归的,所以它们不会导致堆栈溢出。

答案 12 :(得分:1)

ghci中的示例:

>  import qualified Text.Regex as R
>  R.splitRegex (R.mkRegex "x") "2x3x777"
>  ["2","3","777"]

答案 13 :(得分:0)

我迟到了,但如果您正在寻找一个简单的解决方案而不依赖任何臃肿的软件包,我想为那些感兴趣的人添加它:

Array