Haskell,正则表达式,TDFA:匹配(和删除)引用的子串

时间:2017-12-26 08:55:28

标签: regex haskell

正则表达式匹配引用的子串:"/\"(?:[^\"\\]|\\.)*\"/"(原/"(?:[^"\\]|\\.)*"/,见Here)。经过regex101测试,它可以正常工作。

使用TDFA语法:

*** Exception: Explict error in module Text.Regex.TDFA.String : Text.Regex.TDFA.String died:
parseRegex for Text.Regex.TDFA.String failed:"/"(?:[^"\]|\.)*"/" (line 1, column 4):
unexpected "?"
expecting empty () or anchor ^ or $ or an atom

有没有办法纠正它?

测试字符串:Is big "problem", no?

预期结果:"problem"

UPD:

这是完整的背景:

removeQuotedSubstrings :: String -> [String]
removeQuotedSubstrings str =
  let quoteds = concat (str =~ ("/\"(?:[^\"\\]|\\.)*\"/" :: String) :: [[String]])
  in  quoteds

1 个答案:

答案 0 :(得分:0)

没有改善,只是一个可接受的解决方案,虽然缺乏优雅:

import qualified Data.Text as T
import Text.Regex.TDFA

-- | Removes all double quoted substrings, if any, from a string.
--
-- Examples:
--
-- >>> removeQuotedSubstrings "alfa"
-- "alfa"
-- >>> removeQuotedSubstrings "ngoro\"dup\"lai \"ming\""
-- "ngoro lai  "
removeQuotedSubstrings :: String -> String
removeQuotedSubstrings str =
  let quoteds  = filter (('"' ==) . head)
               $ concat (str =~ ("\"(\\.|[^\"\\])*\"" :: String) :: [[String]])
  in  T.unpack $ foldr (\quoted acc -> T.replace (T.pack quoted) " " acc)
                       (T.pack str) quoteds

是的,最终目的始终是删除引用的子串。