我需要比较从各个位置收集的数据,其中一些包含非ASCII字符,特别是带有重音符号的英文字母。例如“FrédérikGauthier。:-61。:-87。:-61。:-87”。当我查看字符的int值时,我注意到这些字符始终是2个“字符”的组合,其值是-61,表示字母将加重音,在这种情况下,字母是-87。重读“ e”。我的目标是仅“删除”重音并使用英文字符。显然,我不能依赖于系统之间的这种行为,那么您如何处理这种情况? std :: string,可以毫无问题地处理字符,但是一旦我进入char级别,就可以解决问题。有指导吗?
#include <iostream>
#include <fstream>
#include <algorithm>
int main(int argc, char** argv){
std::fstream fin;
std::string line;
std::string::iterator it;
bool leave = false;
fin.open(argv[1], std::ios::in);
while(getline(fin, line)){
std::for_each(line.begin(), line.end(), [](char &a){
if(!isascii(a)) {
if(int(a) == -68) a = 'u';
else if(int(a) == -74) a = 'o';
else if(int(a) == -83) a = 'i';
else if(int(a) == -85) a = 'e';
else if(int(a) == -87) a = 'e';
else if(int(a) == -91) a = 'a';
else if(int(a) == -92) a = 'a';
else if(int(a) == -95) a = 'a';
else if(int(a) == -120) a = 'n';
}
});
it = line.begin();
while(it != line.end()){
it = std::find_if(line.begin(), line.end(), [](char &a){ return !isascii(a); });
if(it != line.end()){
line.erase(it);
it = line.begin();
}
}
std::cout << line << std::endl;
std::for_each(line.begin(), line.end(), [&leave](char &a){
if(!isascii(a)) {
std::cout << a << " : " << int(a);
}
});
if(leave){
fin.close();
return 1;
}
}
fin.close();
return 0;
}
答案 0 :(得分:1)
通常,这是一项棘手的任务,您可能需要根据自己的任务调整解决方案。要将字符串从任何编码形式转译为ASCII,最好依赖于库而不是尝试自己实现。这是使用iconv的示例:
#include <iconv.h>
#include <memory>
#include <type_traits>
#include <string>
#include <iostream>
#include <algorithm>
#include <string_view>
#include <cassert>
using namespace std;
string from_u8string(const u8string &s) {
return string(s.begin(), s.end());
}
using iconv_handle = unique_ptr<remove_pointer<iconv_t>::type, decltype(&iconv_close)>;
iconv_handle make_converter(string_view to, string_view from) {
auto raw_converter = iconv_open(to.data(), from.data());
if (raw_converter != (iconv_t)-1) {
return { raw_converter, iconv_close };
} else {
throw std::system_error(errno, std::system_category());
}
}
string convert_to_ascii(string input, string_view encoding) {
iconv_handle converter = make_converter("ASCII//TRANSLIT", encoding);
char* input_data = input.data();
size_t input_size = input.size();
string output;
output.resize(input_size * 2);
char* converted = output.data();
size_t converted_size = output.size();
auto chars_converted = iconv(converter.get(), &input_data, &input_size, &converted, &converted_size);
if (chars_converted != (size_t)(-1)) {
return output;
} else {
throw std::system_error(errno, std::system_category());
}
}
string convert_to_plain_ascii(string_view input, string_view encoding) {
auto converted = convert_to_ascii(string{ input }, encoding);
converted.erase(
std::remove_if(converted.begin(), converted.end(), [](char c) { return !isalpha(c); }),
converted.end()
);
return converted;
}
int main() {
try {
auto converted_utf8 = convert_to_plain_ascii(from_u8string(u8"Frédérik"), "UTF-8");
assert(converted_utf8 == "Frederik");
auto converted_1252 = convert_to_plain_ascii("Frédérik", "windows-1252");
assert(converted_1252 == "Frederik");
} catch (std::system_error& e) {
cout << "Error " << e.code() << ": " << e.what() << endl;
}
}