我试图从二进制文件读取到char数组。打印数组条目时,将打印任意数字(换行符)和所需的数字。我真的无法理解这一点。 该文件的前几个字节是: 00 00 08 03 00 00 EA 60 00 00 00 1C 00 00 00 1C 00 00
我的代码:
void MNISTreader::loadImagesAndLabelsToMemory(std::string imagesPath,
std::string labelsPath) {
std::ifstream is(imagesPath.c_str());
char *data = new char[12];
is.read(data, 12);
std::cout << std::hex << (int)data[2] << std::endl;
delete [] data;
is.close();
}
打印出来:
ffffff9b
8
8是对的。前面的数字从执行变为执行。这条换行符来自哪里?
答案 0 :(得分:1)
您询问了如何从二进制文件中读取数据并将其保存到char[]
,然后您向我们展示了您为问题提交的代码:
void MNISTreader::loadImagesAndLabelsToMemory(std::string imagesPath, std::string labelsPath) { std::ifstream is(imagesPath.c_str()); char *data = new char[12]; is.read(data, 12); std::cout << std::hex << (int)data[2] << std::endl; delete [] data; is.close(); }
你想知道:
前面的数字从执行变为执行。这条换行符来自哪里?
在您真正回答该问题之前,您需要知道二进制文件。那就是内部文件的结构。当您从二进制文件中读取数据时,您必须记住某些程序已将数据写入该文件,并且该数据是以结构化格式编写的。正是这种格式对于每个系列或二进制文件类型而言都是独一无二的。大多数二进制文件通常会遵循一个共同的模式,这样它们就可以容纳一个header
然后甚至sub headers
,然后是集群,数据包或块等,甚至是标题之后的原始数据,而某些二进制文件可能只是纯粹的原始数据。您必须知道文件在内存中的结构。
char = 1 byte
,int = 4 bytes (32bit system) 8 bytes (64bit system)
,float = 4bytes
,double = 8bytes
等。根据您的代码,您有一个array
char
,其大小为12
并且知道您要求的内存中char
为1 byte
为12 bytes
。现在问题在于你连续12个连续的单个字节,并且不知道文件结构如何确定第一个字节是实际的char
写的还是unsigned char
,或者一个int
?
考虑由C++ structs
创建的这两个不同的二进制文件结构,其中包含所有需要的data
,并且两者都以二进制格式写入文件。
两个文件结构都将使用的通用标头结构。
struct Header {
// Size of Header
std::string filepath;
std::string filename;
unsigned int pathSize;
unsigned int filenameSize;
unsigned int headerSize;
unsigned int dataSizeInBytes;
};
FileA 文件A的唯一结构
struct DataA {
float width;
float length;
float height;
float dummy;
}
FileB 文件B的唯一结构
struct DataB {
double length;
double width;
}
内存中的文件通常是这样的:
让我们考虑两个不同的二进制文件,我们已经超过了所有的头信息,我们正在读取要解析的字节。我们得到的数据大小以字节为单位,而FileA
我们得到4 floats = 16bytes
,而对于FileB
我们得到2 doubles = 16bytes
。现在,我们知道如何调用该方法来读取x
数据类型的y
数据量。由于y
现在是type
而x
,我们可以说:y(x)
好像y
是内置类型而x
对于此内置类型,默认内置类型的构造函数的数值初始值设定项是int
,float
,double
,char
等。
现在让我们说我们正在阅读这两个文件中的任何一个,但是我们不知道数据结构以及它的信息先前是如何存储到文件中的,我们通过标题看到数据内存中的大小为16 bytes
,但我们并不知道它是以4 floats = 16 bytes
还是2 doubles = 16 bytes
存储。两种结构都是16个字节,但具有不同数量的不同数据类型。
这样的总和是,在不知道文件的数据结构并且知道如何解析二进制文件的情况下,它变为X/Y Problem
现在让我们假设您确实知道文件结构,尝试从上面回答您的问题,您可以尝试这个小程序并查看一些结果:
#include <string>
#include <iostream>
int main() {
// Using Two Strings
std::string imagesPath("ImagesPath\\");
std::string labelsPath("LabelsPath\\");
// Concat of Two Strings
std::string full = imagesPath + labelsPath;
// Display Of Both
std::cout << full << std::endl;
// Data Type Pointers
char* cData = nullptr;
cData = new char[12];
unsigned char* ucData = nullptr;
ucData = new unsigned char[12];
// Loop To Set Both Pointers To The String
unsigned n = 0;
for (; n < 12; ++n) {
cData[n] = full.at(n);
ucData[n] = full.at(n);
}
// Display Of Both Strings By Character and Unsigned Character
n = 0;
for (; n < 12; ++n) {
std::cout << cData[n];
}
std::cout << std::endl;
n = 0;
for (; n < 12; ++n) {
std::cout << ucData[n];
}
std::cout << std::endl;
// Both Yeilds Same Result
// Okay lets clear out the memory of these pointers and then reuse them.
delete[] cData;
delete[] ucData;
cData = nullptr;
ucData = nullptr;
// Create Two Data Structurs 1 For Each Different File
struct A {
float length;
float width;
float height;
float padding;
};
struct B {
double length;
double width;
};
// Constants For Our Data Structure Sizes
const unsigned sizeOfA = sizeof(A);
const unsigned sizeOfB = sizeof(B);
// Create And Populate An Instance Of Each
A a;
a.length = 3.0f;
a.width = 3.0f;
a.height = 3.0f;
a.padding = 0.0f;
B b;
b.length = 5.0;
b.width = 5.0;
// Lets First Use The `Char[]` Method for each struct and print them
// but we need 16 bytes instead of `12` from your problem
char *aData = nullptr; // FileA
char *bData = nullptr; // FileB
aData = new char[16];
bData = new char[16];
// Since A has 4 floats we know that each float is 4 and 16 / 4 = 4
aData[0] = a.length;
aData[4] = a.width;
aData[8] = a.height;
aData[12] = a.padding;
// Print Out Result but by individual bytes without casting for A
// Don't worry about the compiler warnings and build and run with the
// warning and compare the differences in what is shown on the screen
// between A & B.
n = 0;
for (; n < 16; ++n) {
std::cout << aData[n] << " ";
}
std::cout << std::endl;
// Since B has 2 doubles weknow that each double is 8 and 16 / 8 = 2
bData[0] = b.length;
bData[8] = b.width;
// Print out Result but by individual bytes without casting for B
n = 0;
for (; n < 16; ++n) {
std::cout << bData[n] << " ";
}
std::cout << std::endl;
// Let's Print Out Both Again But By Casting To Their Approriate Types
n = 0;
for (; n < 4; ++n) {
std::cout << reinterpret_cast<float*>(aData[n]) << " ";
}
std::cout << std::endl;
n = 0;
for (; n < 2; ++n) {
std::cout << reinterpret_cast<double*>(bData[n]) << " ";
}
std::cout << std::endl;
// Clean Up Memory
delete[] aData;
delete[] bData;
aData = nullptr;
bData = nullptr;
// Even By Knowing The Appropriate Sizes We Can See A Difference
// In The Stored Data Types. We Can Now Do The Same As Above
// But With Unsigned Char & See If It Makes A Difference.
unsigned char *ucAData = nullptr;
unsigned char *ucBData = nullptr;
ucAData = new unsigned char[16];
ucBData = new unsigned char[16];
// Since A has 4 floats we know that each float is 4 and 16 / 4 = 4
ucAData[0] = a.length;
ucAData[4] = a.width;
ucAData[8] = a.height;
ucAData[12] = a.padding;
// Print Out Result but by individual bytes without casting for A
// Don't worry about the compiler warnings and build and run with the
// warning and compare the differences in what is shown on the screen
// between A & B.
n = 0;
for (; n < 16; ++n) {
std::cout << ucAData[n] << " ";
}
std::cout << std::endl;
// Since B has 2 doubles weknow that each double is 8 and 16 / 8 = 2
ucBData[0] = b.length;
ucBData[8] = b.width;
// Print out Result but by individual bytes without casting for B
n = 0;
for (; n < 16; ++n) {
std::cout << ucBData[n] << " ";
}
std::cout << std::endl;
// Let's Print Out Both Again But By Casting To Their Approriate Types
n = 0;
for (; n < 4; ++n) {
std::cout << reinterpret_cast<float*>(ucAData[n]) << " ";
}
std::cout << std::endl;
n = 0;
for (; n < 2; ++n) {
std::cout << reinterpret_cast<double*>(ucBData[n]) << " ";
}
std::cout << std::endl;
// Clean Up Memory
delete[] ucAData;
delete[] ucBData;
ucAData = nullptr;
ucBData = nullptr;
// So Even Changing From `char` to an `unsigned char` doesn't help here even
// with reinterpret casting. Because These 2 Files Are Different From One Another.
// They have a unique signature. Now a family of files where a specific application
// saves its data to a binary will all follow the same structure. Without knowing
// the structure of the binary file and knowing how much data to pull in and the big key
// word here is `what type` of data you are reading in and by how much. This becomes an (X/Y) Problem.
// This is the hard part about parsing binaries, you need to know the file structure.
char c = ' ';
std::cin.get(c);
return 0;
}
运行上面的短程序后,不要担心屏幕上显示的每个值是什么;只需看看那些用于比较两种不同文件结构的模式。这只是为了表明struct of floats
宽的16 bytes
与struct of doubles
宽16 bytes
的{{1}}不同。因此,当我们回到您的问题并且您正在12 individual consecutive bytes
中阅读问题时,这些问题将成为这些12 bytes
代表的问题?在32位计算机上是3 ints
还是3 unsigned ints
,在64位计算机上是2 ints
还是2 unsigned ints
,还是3 floats
,还是{{1} }和2 doubles
?您正在阅读的二进制文件的当前数据结构是什么?
编辑在我写的小程序中;我确实忘记尝试或添加打印输出语句中的1 float
,他们也可以添加,因为每次打印索引指针都被使用但是没有必要这样做,因为输出到显示器是同样的事情,因为它只能在视觉上显示或表达内存中两个数据结构的差异以及它们的模式。