在下面的程序中,我试图用非ASCII字符来测量字符串的长度。
但是,我不确定为什么size()
在使用非ASCII字符时不会打印正确的长度。
#include <iostream>
#include <string>
int main()
{
std::string s1 = "Hello";
std::string s2 = "इंडिया"; // non-ASCII string
std::cout << "Size of " << s1 << " is " << s1.size() << std::endl;
std::cout << "Size of " << s2 << " is " << s2.size() << std::endl;
}
输出:
Size of Hello is 5
Size of इंडिया is 18
现场演示Wandbox。
答案 0 :(得分:4)
std::string::size
以字节为单位返回长度,而不是字符数。您的第二个字符串使用UNICODE编码,因此每个字符可能需要几个字节。请注意,这同样适用于std::wstring::size
,因为它取决于编码(它返回宽字符的数量,而不是实际字符:如果使用UTF-16,它将匹配,但不一定适用于其他编码,更多{ {3}})。
要测量实际长度(符号数),您需要知道编码,以便正确分离(并因此计算)字符。 in this answer可能对UTF-8有帮助(虽然在C ++ 17中不推荐使用该方法)。
UTF-8的另一个选项是计算第一个字节的数量(This answer):
int utf8_length(const std::string& s) {
int len = 0;
for (auto c : s)
len += (c & 0xc0) != 0x80;
return len;
}
答案 1 :(得分:1)
我使用了std::wstring_convert类并获得了正确的字符串长度。
public static void SetPcapData(String directory){
final StringBuilder errbuf = new StringBuilder();
//archivo cargara en memoria el paquete .pcap
Log.i("Abriendo PCAP desde", directory);
Pcap pcapfile = Pcap.openOffline(directory, errbuf);
if (pcapfile == null) {
Log.e("Error al abrir PCAP", errbuf.toString());
}
Ethernet eth = new Ethernet();
Http http = new Http();
Ip4 ip4 = new Ip4();
Tcp tcp = new Tcp();
Udp udp = new Udp();
PcapHeader hdr = new PcapHeader(JMemory.POINTER);
//PcapPacket packet = new PcapPacket(JMemory.POINTER);
JBuffer buf = new JBuffer(JMemory.POINTER);
assert pcapfile != null;
int id = JRegistry.mapDLTToId(pcapfile.datalink());
int contIP, contETH, contHTTP, contUDP, contTCP;
contIP = contETH = contHTTP = contUDP = contTCP = 1;
while(pcapfile.nextEx(hdr, buf) == Pcap.NEXT_EX_OK) {
PcapPacket packet = new PcapPacket(hdr, buf);
packet.scan(id);
String str;
Log.i("::::", "-----------------------------------------------------------------------");
if (packet.hasHeader(eth)) {
str = eth.toString();
Log.i("#" + String.valueOf(contETH) + " ETH src", FormatUtils.mac(eth.source()) + " | " + FormatUtils.mac(eth.destination()));
ethData.add(str);
contETH++;
if (packet.hasHeader(ip4)) {
str = FormatUtils.ip(ip4.source());
Log.i("#" + String.valueOf(contIP) + " IP src", str);
ipSource.add(str);
str = FormatUtils.ip(ip4.destination());
Log.i("#" + String.valueOf(contIP) + " IP dest", str);
ipDestination.add(str);
contIP++;
if (packet.hasHeader(tcp)) {
str = String.valueOf(tcp.source()) + " | " + String.valueOf(tcp.destination());
Log.i("#" + String.valueOf(contTCP) + " TCP src|dest port", str);
tcpPortSource.add(String.valueOf(tcp.source()));
tcpPortDestination.add(String.valueOf(tcp.destination()));
contTCP++;
} else if (packet.hasHeader(udp)) {
str = String.valueOf(udp.source()) + " | " + String.valueOf(udp.destination());
Log.i("#" + String.valueOf(contUDP) + " UDP src|dest port", str);
udpPortSource.add(String.valueOf(udp.source()));
udpPortDestination.add(String.valueOf(udp.destination()));
contUDP++;
if(udp.source() == 53 || udp.destination() == 53){
//here is where I need to start extracting DNS packets
}
}
}
}
}
pcapfile.close();
}
现场演示wandbox。
重要性参考链接here了解有关#include <string>
#include <iostream>
#include <codecvt>
int main()
{
std::string s1 = "Hello";
std::string s2 = "इंडिया"; // non-ASCII string
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> cn;
auto sz = cn.from_bytes(s2).size();
std::cout << "Size of " << s2 << " is " << sz << std::endl;
}