Question

有没有办法如何从,.?!删除所有非字母字符（例如std::string等）而不删除ščéř等捷克符号？我尝试使用：

std::string FileHandler::removePunctuation(std::string word) {
    for (std::string::iterator i = word.begin(); i != word.end(); i++) {
        if (!isalpha(word.at(i - word.begin()))) {
            word.erase(i);
            i--;
        }
    }
    return word;    
}

但它会删除捷克字符。

在最好的情况下，我也想对这些符号使用toLowerCase。

Answer 1

您可以将std::remove_if与erase一起使用：

#include <cctype>
#include <algorithm>
#include <string>
//...
std::wstring FileHandler::removePunctuation(std::wstring word) 
{
    word.erase(std::remove_if(word.begin(), word.end(), 
                  [](char ch){ return !::iswalnum(ch); }), word.end());
    return word;
}

Answer 2

这是一个想法：

#include <iostream>
#include <cwctype>
// if windows, add this: #include <io.h>
// if windows, add this: #include <fcntl.h>

int main()
{
  // if windows, add this: _setmode( _fileno( stdout ), _O_U16TEXT );
  std::wstring s( L"š1č2é3ř!?" );
  for ( auto c : s )
    if ( std::iswalpha( c ) )
      std::wcout << c;
  return 0;
}

Answer 3

致电std::setlocale(LC_ALL, "en_US.UTF-8")后，您可以使用std::iswalpha()来确定是否有字母。

以下程序

#include <cwctype>
#include <iostream>
#include <string>

int main()
{
    std::setlocale(LC_ALL, "en_US.UTF-8");
    std::wstring youreWelcome = L"Není zač.";

    for ( auto c : youreWelcome )
        if ( std::iswalpha(c) )
            std::wcout << c;

    std::wcout << std::endl;
}

将打印

Nenízač

到控制台。

请注意，std::setlocale()本身可能不是线程安全的，也不是与std::iswalpha()同时执行的某些其他函数一起使用。因此，它只应用于程序启动代码等单线程代码中。更具体地说，如果您需要，则不应在std::setlocale()内拨打FileHandler::removePunctuation()，而只能在std::iswalpha()内拨打compare_functions.c。

Answer 4

您可能必须编写isalpha的自定义版本。从你描述的内容来看，它似乎只对a-z和A-Z返回。

从字符串中删除所有非字母字符

4 个答案: