我有一个文本文件,其中的字符串由空格分隔。文本文件包含一些特殊字符(拉丁语,货币,标点符号等),需要从最终输出中丢弃。请注意,除了这些特殊字符外,合法字符都是Unicode中的字符。
我们需要用空格分隔/拆分文本,然后只删除前导和尾随特殊字符。如果特殊字符位于两个合法字符之间,那么我们不会删除它们。
我可以分两个阶段轻松完成。按空格分割文本,然后仅从每个字符串中删除前导和尾随特殊字符。但是,我只需要处理一次字符串。有什么办法,可以一次完成。注意:我们不能使用RegEx。 对于这个问题,假设这些字符是特殊的:
[: , ! . < ; ' " > [ ] { } ` ~ = + - ? / ]
示例:
:!/,.<;:.?;,BBM!/,.<;:.?;,` IS TALKING TO `B!?AM!/,.<;:.?;,
此处输出将是一个有效字符串数组:["BBM", "IS", "TALKING", "TO", "B!?AM"]
答案 0 :(得分:0)
答案 1 :(得分:0)
我已经使用过打字稿并且已经完成了一次。 请注意,isSpecialCharacterCode(charCode)函数只是检查文本字符的unicode是否与提供的特殊字符的unicode相同。对于isWhitespaceCode(charCode)函数,该名称为true。
parseText(text: string): string[]{
let words : string[] = [];
let word = "";
let charCode = 1;
let haveSeenLegalChar = false; //set it if we have encountered legal character in text
let seenSpecialCharsToInclude = false; //set it if we have encountered //special character in text
let inBetweenSpecialChars = ""; // string containing special chars //which may be included in between legal word
for(let index = 0; index < text.length; index++){
charCode = text.charCodeAt(index);
let isSpecialChar = isSpecialCharacterCode(charCode);
let isWhitespace = isWhitespaceCode(charCode);
if(isSpecialChar && !isWhitespace){
//if this is a special character then two cases
//first is: It can be part of word (it is only possible if we have already seen atleast one legal character)
//Since it can be part of word but we are not sure whether this will be part of word so store it for now
//second is: This is either leading or trailing special character..we should not include these in word
if(haveSeenLegalChar){
inBetweenSpecialChars += text[index];
seenSpecialCharsToInclude = true;
}else{
//since we have not seen any legal character till now so it must be either leading or trailing special chars
seenSpecialCharsToInclude = false;
inBetweenSpecialChars = "";
}
}else if(isWhitespace){
//we have encountered a whitespace.This is either beginning of word or ending of word.
//if we have encountered any leagl char, push word into array
if(haveSeenLegalChar){
words.push(word);
word = "";
inBetweenSpecialChars = "";
}
haveSeenLegalChar = false;
}else if(!isSpecialChar){
//legal character case
haveSeenLegalChar = true;
if(seenSpecialCharsToInclude){
word += inBetweenSpecialChars;
seenSpecialCharsToInclude = false;
inBetweenSpecialChars = "";
}
word += text[index];
}
}
return words;
}
&#13;