Haskell提取字符串中的子字符串

时间:2014-01-23 02:18:16

标签: string parsing haskell substring extract

我的目标是找到字符串中存在子字符串的次数。 我要查找的子字符串类型为" [n]",其中n可以是任何变量。

我的尝试涉及使用单词function分割字符串, 然后创建一个新的字符串列表,如果' head'字符串是' ['和 最后的'相同的字符串是']'

我遇到的问题是我输入了一个String,当使用时进行拆分 功能词,创建了一个看起来像这样的字符串" [2]," 现在,我仍然希望将其视为类型" [n]"

的出现

一个例子是我想要这个字符串,

ASDF [1] JKL [2] ASDF [1] JKL

返回3.

这是我的代码:

-- String that will be tested on references function
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
  "get to their goal, and in the end the thing they want the most ends " ++
  "up destroying them.  In case of [2], this is a whale..."

-- Function that will take a list of Strings and return a list that contains
-- any String of the type [n], where n is an variable
ref :: [String] -> [String]
ref [] = []
ref xs = [x | x <- xs, head x == '[', last x == ']']

-- Function takes a text with references in the format [n] and returns
-- the total number of references.
-- Example :  ghci> references txt -- -> 3
references :: String -> Integer   
references txt = len (ref (words txt))

如果有人可以启发我如何在字符串中搜索子字符串 或者如何解析给定子字符串的字符串,非常感谢。

3 个答案:

答案 0 :(得分:4)

我只想使用正则表达式,并按照以下方式编写:

import Text.Regex.Posix

txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
  "get to their goal, and in the end the thing they want the most ends " ++
  "up destroying them.  In case of [2], this is a whale..."


-- references counts the number of references in the input string
references :: String -> Int
references str = str =~ "\\[[0-9]*\\]"

main = putStrLn $ show $ references txt -- outputs 3

答案 1 :(得分:2)

对于这样一个简单的问题,正则表达式是一个巨大的过度杀伤力。

references = length . consume

consume []       = []
consume ('[':xs) = let (v,rest) = consume' xs in v:consume rest
consume (_  :xs) = consume xs

consume' []       = ([], []) 
consume' (']':xs) = ([], xs)
consume' (x  :xs) = let (v,rest) = consume' xs in (x:v, rest)

consume等待[,然后拨打consume',收集所有内容,直到]

答案 2 :(得分:0)

这是一个解决方案 sepCap

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Either
import Data.Maybe

txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
  "get to their goal, and in the end the thing they want the most ends " ++
  "up destroying them.  In case of [2], this is a whale..."

pattern = single '[' *> anySingle <* single ']' :: Parsec Void String Char
length $ rights $ fromJust $ parseMaybe (sepCap pattern) txt
3