Question

我正在为应该标记字符串的赋值编写一些代码。它接受3个字符串作为参数，并返回字符串列表。参数是应标记化的字符串，s（如果出现在s中应作为列表中的单个字符串的字符串s）和应删除的字符串来自tokenize "ab cd -ef" "" " "。

一些我们应该得到的例子：

["ab","cd","-ef"]应该给出：tokenize "ab cd -ef" "-" " "

和

["ab","cd","-","ef"]应该给出：tokenize :: String -> [Char] -> [Char] -> [String] tokenize [] imp rem = [] tokenize (x:xs) imp rem = if null imp && null rem then [(x:xs)] else if null imp && rem == " " then ord (x:xs) else if finnes rem x then tokenize xs imp rem else if finnes imp x then [x]:(tokenize xs imp rem) else [x] : (tokenize xs imp rem) ord :: String -> [String] ord [] = [] ord s = takeWhile (/= ' ') s : (ord (fjernOrd s ' ')) fjernOrd :: String -> Char -> String fjernOrd [] c = [] fjernOrd xs c = if (head xs) /= c then fjernOrd (tail xs) c else tail xs --wrote my own version of elem to check if a character exists in a string finnes :: String -> Char -> Bool finnes "" c = False finnes (s:xs) c | s == c = True | otherwise = finnes xs c

我已经在网上和书籍中寻求帮助，我的代码为某些示例（并非全部）提供了正确的输出。我仍在学习Haskell，所以我不理解其他人已经在线获得的所有帮助（并且我不想复制并粘贴）

到目前为止，这是我的代码：

" "

所有其他功能在我尝试它们时都应按应有的方式运行，所以问题出在令牌化功能上。

当我将rem放入""并将imp放入imp时有效，但是当我在tokenize "ab cd -ef"中放入字符时无效（例如“-”）

例如，当我写["ab","cd","-ef"]时，我得到tokenize "ab cd -ef" "-" " "，这是正确的，但是当我键入["ab","cd","-","ef"]时，我应该得到 ["a","b","c","d","-","e","f"]，但我得到//inside api/purchase/route.php Route::get('/purchase', function () { echo 'go purchase'; });

Answer 1

因此，这是我认为您想要的相当冗长的实现...对我来说还不太清楚。这个想法是：为每个标记应用一个函数，将其转换为字符串列表。然后将所有结果合并为一个。。从这个定义可以更清楚地知道您是否需要以下内容：

import Data.List
tokenize s imp del = concatMap g $ words s
                     ^^^^^^^^  ^   ^^^^^^^  
                     |         |   |- This the list of token
                     |         |- This is a function that apply the logic to each token. We'll think about this later
                     |- This will apply g to each token a concat the results

concatMap和words由Data.List模块给定。我们只需要考虑g。现在，对于每个令牌，您需要一个表现如下的累加器：

对于令牌中的每个字符，

：如果字符位于del中，则忽略它并继续。如果字符位于imp中，则构建一个字符串，对其进行累加并继续。否则，将char与最后一个累加值连接起来 （如果该值不在 imp中）。

由于需要检查两次字符是否在imp中，因此在累加器中跟踪这种检查很有意义。因此，它的类型应该为([String], Bool)。 [String]将保留您的字符串，而Bool是一个标志，表示“嘿！，您添加到列表中的最后一个值是重要的”。。讲完所有这些之后，实施过程就一个字一个字地跟进了

import Data.List

type DelString = [Char]
type ImpString = [Char]

tokenize :: String -> ImpString -> DelString -> [String]
tokenize s imp del = concatMap (fst . foldr (g imp del) ([], False) ) $ words s
                                ^^^   ^^^^^ 
--                              |     |- Remeber this foldr is apply to each token
--                              |- The final result is a Tuple. Take the first element

g :: ImpString -> DelString -> Char -> ([String], Bool) -> ([String], Bool)
g imp del c acc@(l, b)
  | c `elem` del = acc                   -- If c in del, just continue without touching the acc
  | c `elem` imp = ([c] : l, True)       -- If c is important, append it as a String and set the flag to True
  | otherwise    = case acc of
      ([]  ,     _)  -> ([c] : l, False) -- Probably you can get rid of this... but just in case (no pun intended)
      (_   ,  True)  -> ([c] : l, False) -- The case of previous value being important
      (s:ss, False) -> ((c:s):ss, False) -- The regular case



> tokenize "ab cd -ef" "" ""
["ab","cd","-ef"]
> tokenize "ab cd -ef" "-" ""
["ab","cd","-","ef"]
> tokenize "ab cd -ef" "-" "c"
["ab","d","-","ef"]
> tokenize "ab c#d -e*f" "-*#" "c"
["ab","#","d","-","e","*","f"]

标记化功能，无法正确分割字符串

1 个答案: