Question

我有一个文本文件，其中的字符串由空格分隔。文本文件包含一些特殊字符（拉丁语，货币，标点符号等），需要从最终输出中丢弃。请注意，除了这些特殊字符外，合法字符都是Unicode中的字符。

我们需要用空格分隔/拆分文本，然后只删除前导和尾随特殊字符。如果特殊字符位于两个合法字符之间，那么我们不会删除它们。

我可以分两个阶段轻松完成。按空格分割文本，然后仅从每个字符串中删除前导和尾随特殊字符。但是，我只需要处理一次字符串。有什么办法，可以一次完成。注意：我们不能使用RegEx。对于这个问题，假设这些字符是特殊的：

[: , ! . < ; '  "  >  [ ] { }  `  ~ = + - ? / ]

示例：

:!/,.<;:.?;,BBM!/,.<;:.?;,` IS TALKING TO `B!?AM!/,.<;:.?;,

此处输出将是一个有效字符串数组：["BBM", "IS", "TALKING", "TO", "B!?AM"]

Answer 1

制作简单状态机（有限自动机）
循环遍历所有字符
在每一步检查当前字符是字母，空格还是特殊
执行一些操作（可能为空），具体取决于状态和字符类型
根据需要更改状态

例如，你可以留在＆＃34;特别＆＃34;直到满足信件为止。记住开始单词的索引并在单词＆＃34;中创建状态＆＃34;。继续，直到满足特殊字符或空格（您的问题仍然不清楚）。

Answer 2

我已经使用过打字稿并且已经完成了一次。请注意，isSpecialCharacterCode（charCode）函数只是检查文本字符的unicode是否与提供的特殊字符的unicode相同。对于isWhitespaceCode（charCode）函数，该名称为true。

＆＃13;

  

parseText(text: string): string[]{

    let words : string[] = [];
    let word = "";
    let charCode = 1;

    let haveSeenLegalChar = false; //set it if we have encountered legal character in text

    let seenSpecialCharsToInclude = false; //set it if we have encountered //special character in text

    let inBetweenSpecialChars = ""; // string containing special chars //which may be included in between legal word

    for(let index = 0; index < text.length; index++){

        charCode = text.charCodeAt(index);
        let isSpecialChar = isSpecialCharacterCode(charCode);
        let isWhitespace = isWhitespaceCode(charCode);
        if(isSpecialChar && !isWhitespace){
            //if this is a special character then two cases
            //first is: It can be part of word (it is only possible if we have already seen atleast one legal character)
            //Since it can be part of word but we are not sure whether this will be part of word so store it for now
            //second is: This is either leading or trailing special character..we should not include these in word
            if(haveSeenLegalChar){
                inBetweenSpecialChars += text[index];
                seenSpecialCharsToInclude = true;
            }else{
                //since we have not seen any legal character till now so it must be either leading or trailing special chars
                seenSpecialCharsToInclude = false;
                inBetweenSpecialChars = "";
            }
        }else if(isWhitespace){
            //we have encountered a whitespace.This is either beginning of word or ending of word.
            //if we have encountered any leagl char, push word into array
            if(haveSeenLegalChar){
                words.push(word);
                word = "";
                inBetweenSpecialChars = "";
            }
            haveSeenLegalChar = false;
        }else if(!isSpecialChar){
            //legal character case
            haveSeenLegalChar = true;
            if(seenSpecialCharsToInclude){
                word += inBetweenSpecialChars;
                seenSpecialCharsToInclude = false;
                inBetweenSpecialChars = "";
            }
            word += text[index];
        }
    }
    return words;
}

＆＃13;

从a中删除前导和尾随字符

2 个答案: