Question

我有一个文件，我正在解析单词，但我想将任何不是a-z，A-Z，0-9或撇号的内容视为空格。如果我之前使用以下代码，我怎么能这样做：

ifstream file;
file.open(filePath);

while(file >> word){
    listOfWords.push_back(word); // I want to make sure only words with the stated 
                                 // range of characters exist in my list.
}

因此，例如，单词hor.se将是我列表中的两个元素，“hor”和“se”。

Answer 1

创建一个“空白字符”列表，然后每次遇到一个字符时，检查该字符是否在列表中，如果是，则启动一个新单词。这个例子是用python编写的，但概念是一样的。

def get_words(whitespace_chars, string):
    words = []
    current_word = ""
    for x in range(0, len(string)):
        #check to see if we hit the end of a word.                                                                                                                                                                                           
        if(string[x] in whitespace_chars and current_word != ""):
            words.append(current_word)
            current_word = ""
        #add current letter to current word.                                                                                     
        else:
            current_word += string[x]
    #if the last letter isnt whitespace then the last word wont be added, so add here.                                                                                                                                                       
    if(current_word != ""):
        words.append(current_word)
    return words

返回单词

使用特定分隔符解析文档中的单词

1 个答案: