
时间:2019-09-16 02:39:55

标签: haskell

我正在为应该标记字符串的赋值编写一些代码。它接受3个字符串作为参数,并返回字符串列表。参数是应标记化的字符串,s(如果出现在s中应作为列表中的单个字符串的字符串s)和应删除的字符串来自tokenize "ab cd -ef" "" " "



["ab","cd","-ef"]应该给出:tokenize "ab cd -ef" "-" " "


["ab","cd","-","ef"]应该给出:tokenize :: String -> [Char] -> [Char] -> [String] tokenize [] imp rem = [] tokenize (x:xs) imp rem = if null imp && null rem then [(x:xs)] else if null imp && rem == " " then ord (x:xs) else if finnes rem x then tokenize xs imp rem else if finnes imp x then [x]:(tokenize xs imp rem) else [x] : (tokenize xs imp rem) ord :: String -> [String] ord [] = [] ord s = takeWhile (/= ' ') s : (ord (fjernOrd s ' ')) fjernOrd :: String -> Char -> String fjernOrd [] c = [] fjernOrd xs c = if (head xs) /= c then fjernOrd (tail xs) c else tail xs --wrote my own version of elem to check if a character exists in a string finnes :: String -> Char -> Bool finnes "" c = False finnes (s:xs) c | s == c = True | otherwise = finnes xs c



" "


当我将rem放入""并将imp放入imp时有效,但是当我在tokenize "ab cd -ef"中放入字符时无效(例如“-”)

例如,当我写["ab","cd","-ef"]时,我得到tokenize "ab cd -ef" "-" " ",这是正确的,但是当我键入["ab","cd","-","ef"]时,我应该得到 ["a","b","c","d","-","e","f"],但我得到

import Data.List
tokenize s imp del = concatMap g $ words s
                     ^^^^^^^^  ^   ^^^^^^^  
                     |         |   |- This the list of token
                     |         |- This is a function that apply the logic to each token. We'll think about this later
                     |- This will apply g to each token a concat the results



:如果字符位于del中,则忽略它并继续。如果字符位于imp中,则构建一个字符串,对其进行累加并继续。否则,将char与最后一个累加值连接起来 (如果该值不在 imp中)。

由于需要检查两次字符是否在imp中,因此在累加器中跟踪这种检查很有意义。因此,它的类型应该为([String], Bool)[String]将保留您的字符串,而Bool是一个标志,表示“嘿!,您添加到列表中的最后一个值是重要的”。。讲完所有这些之后,实施过程就一个字一个字地跟进了

import Data.List

type DelString = [Char]
type ImpString = [Char]

tokenize :: String -> ImpString -> DelString -> [String]
tokenize s imp del = concatMap (fst . foldr (g imp del) ([], False) ) $ words s
                                ^^^   ^^^^^ 
--                              |     |- Remeber this foldr is apply to each token
--                              |- The final result is a Tuple. Take the first element

g :: ImpString -> DelString -> Char -> ([String], Bool) -> ([String], Bool)
g imp del c acc@(l, b)
  | c `elem` del = acc                   -- If c in del, just continue without touching the acc
  | c `elem` imp = ([c] : l, True)       -- If c is important, append it as a String and set the flag to True
  | otherwise    = case acc of
      ([]  ,     _)  -> ([c] : l, False) -- Probably you can get rid of this... but just in case (no pun intended)
      (_   ,  True)  -> ([c] : l, False) -- The case of previous value being important
      (s:ss, False) -> ((c:s):ss, False) -- The regular case

> tokenize "ab cd -ef" "" ""
> tokenize "ab cd -ef" "-" ""
> tokenize "ab cd -ef" "-" "c"
> tokenize "ab c#d -e*f" "-*#" "c"