标记化功能,无法正确分割字符串

时间:2019-09-16 02:39:55

标签: haskell

我正在为应该标记字符串的赋值编写一些代码。它接受3个字符串作为参数,并返回字符串列表。参数是应标记化的字符串,s(如果出现在s中应作为列表中的单个字符串的字符串s)和应删除的字符串来自tokenize "ab cd -ef" "" " "

一些我们应该得到的例子:

  

["ab","cd","-ef"]应该给出:tokenize "ab cd -ef" "-" " "

  

["ab","cd","-","ef"]应该给出:tokenize :: String -> [Char] -> [Char] -> [String] tokenize [] imp rem = [] tokenize (x:xs) imp rem = if null imp && null rem then [(x:xs)] else if null imp && rem == " " then ord (x:xs) else if finnes rem x then tokenize xs imp rem else if finnes imp x then [x]:(tokenize xs imp rem) else [x] : (tokenize xs imp rem) ord :: String -> [String] ord [] = [] ord s = takeWhile (/= ' ') s : (ord (fjernOrd s ' ')) fjernOrd :: String -> Char -> String fjernOrd [] c = [] fjernOrd xs c = if (head xs) /= c then fjernOrd (tail xs) c else tail xs --wrote my own version of elem to check if a character exists in a string finnes :: String -> Char -> Bool finnes "" c = False finnes (s:xs) c | s == c = True | otherwise = finnes xs c

我已经在网上和书籍中寻求帮助,我的代码为某些示例(并非全部)提供了正确的输出。我仍在学习Haskell,所以我不理解其他人已经在线获得的所有帮助(并且我不想复制并粘贴)

到目前为止,这是我的代码:

" "

所有其他功能在我尝试它们时都应按应有的方式运行,所以问题出在令牌化功能上。

当我将rem放入""并将imp放入imp时有效,但是当我在tokenize "ab cd -ef"中放入字符时无效(例如“-”)

例如,当我写["ab","cd","-ef"]时,我得到tokenize "ab cd -ef" "-" " ",这是正确的,但是当我键入["ab","cd","-","ef"]时,我应该得到 ["a","b","c","d","-","e","f"],但我得到//inside api/purchase/route.php Route::get('/purchase', function () { echo 'go purchase'; });

1 个答案:

答案 0 :(得分:0)

因此,这是我认为您想要的相当冗长的实现...对我来说还不太清楚。这个想法是:为每个标记应用一个函数,将其转换为字符串列表。然后将所有结果合并为一个。。从这个定义可以更清楚地知道您是否需要以下内容:

import Data.List
tokenize s imp del = concatMap g $ words s
                     ^^^^^^^^  ^   ^^^^^^^  
                     |         |   |- This the list of token
                     |         |- This is a function that apply the logic to each token. We'll think about this later
                     |- This will apply g to each token a concat the results

concatMapwordsData.List模块给定。我们只需要考虑g。现在,对于每个令牌,您需要一个表现如下的累加器:

对于令牌中的每个字符,

:如果字符位于del中,则忽略它并继续。如果字符位于imp中,则构建一个字符串,对其进行累加并继续。否则,将char与最后一个累加值连接起来 (如果该值不在 imp中)。

由于需要检查两次字符是否在imp中,因此在累加器中跟踪这种检查很有意义。因此,它的类型应该为([String], Bool)[String]将保留您的字符串,而Bool是一个标志,表示“嘿!,您添加到列表中的最后一个值是重要的”。。讲完所有这些之后,实施过程就一个字一个字地跟进了

import Data.List

type DelString = [Char]
type ImpString = [Char]

tokenize :: String -> ImpString -> DelString -> [String]
tokenize s imp del = concatMap (fst . foldr (g imp del) ([], False) ) $ words s
                                ^^^   ^^^^^ 
--                              |     |- Remeber this foldr is apply to each token
--                              |- The final result is a Tuple. Take the first element

g :: ImpString -> DelString -> Char -> ([String], Bool) -> ([String], Bool)
g imp del c acc@(l, b)
  | c `elem` del = acc                   -- If c in del, just continue without touching the acc
  | c `elem` imp = ([c] : l, True)       -- If c is important, append it as a String and set the flag to True
  | otherwise    = case acc of
      ([]  ,     _)  -> ([c] : l, False) -- Probably you can get rid of this... but just in case (no pun intended)
      (_   ,  True)  -> ([c] : l, False) -- The case of previous value being important
      (s:ss, False) -> ((c:s):ss, False) -- The regular case



> tokenize "ab cd -ef" "" ""
["ab","cd","-ef"]
> tokenize "ab cd -ef" "-" ""
["ab","cd","-","ef"]
> tokenize "ab cd -ef" "-" "c"
["ab","d","-","ef"]
> tokenize "ab c#d -e*f" "-*#" "c"
["ab","#","d","-","e","*","f"]