Question

目标

我的目标是快速从 large 二进制字符串（一个只包含1和0的字符串）创建文件。

直截了当

我需要一个可以实现我的目标的功能。如果我不够清楚，请继续阅读。

示例

Test.exe is running...
.
Inputted binary string:
        1111111110101010
Writing to: c:\users\admin\desktop\Test.txt
        Done!
File(Test.txt) In Byte(s):
        0xFF, 0xAA
.
Test.exe executed successfully!

解释

首先，Test.exe请求用户输入二进制字符串。
然后，它将输入的二进制字符串转换为十六进制。
最后，它将转换后的值写入名为Test.txt的文件。

我试过

作为实现目标的失败尝试，我创造了这个简单（可能很可怕）的功能（嘿，至少我试过）：

void BinaryStrToFile( __in const char* Destination,
                      __in std::string &BinaryStr )
{
    std::ofstream OutputFile( Destination, std::ofstream::binary );

    for( ::UINT Index1 = 0, Dec = 0;
         // 8-Bit binary.
         Index1 != BinaryStr.length( )/8;

         // Get the next set of binary value.
         // Write the decimal value as unsigned char to file.
         // Reset decimal value to 0.
         ++ Index1, OutputFile << ( ::BYTE )Dec, Dec = 0 )
    {
        // Convert the 8-bit binary to hexadecimal using the
        // positional notation method - this is how its done:
        // http://www.wikihow.com/Convert-from-Binary-to-Decimal
        for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
            if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
    }
    OutputFile.close( );
};

使用示例

#include "Global.h"

void BinaryStrToFile( __in const char* Destination,
                      __in std::string &BinaryStr );

int main( void )
{
    std::string Bin = "";

    // Create a binary string that is a size of 9.53674 mb
    // Note: The creation of this string will take awhile.
    // However, I only start to calculate the speed of writing
    // and converting after it is done generating the string.
    // This string is just created for an example.
    std::cout << "Generating...\n";
    while( Bin.length( ) != 80000000 )
        Bin += "10101010";

    std::cout << "Writing...\n";
    BinaryStrToFile( "c:\\users\\admin\\desktop\\Test.txt", Bin );

    std::cout << "Done!\n";
#ifdef IS_DEBUGGING
    std::cout << "Paused...\n";
    ::getchar( );
#endif

    return( 0 );
};

问题

同样，这是我未能实现目标的尝试。问题是速度。太慢了。花了7分多钟。有没有方法可以从大型二进制字符串中快速创建文件？

提前致谢，

推行清洁

Answer 1

我建议删除内循环中的substr调用。您正在分配一个新字符串，然后为您处理的每个字符销毁它。替换此代码：

for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
    if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' )
        Dec += Inc;

通过以下方式：

for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
    if( BinaryStr[Index1 * 8 + Index2 ] == '1' )
        Dec += Inc;

Answer 2

你的大部分时间都花在这里：

   for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
        if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;

当我发表评论时，文件会在几秒钟内写完。我想你需要微调你的转换。

Answer 3

我想我会认为这是一个起点：

#include <bitset>
#include <fstream>
#include <algorithm>

int main() { 
    std::ifstream in("junk.txt", std::ios::binary | std::ios::in);
    std::ofstream out("junk.bin", std::ios::binary | std::ios::out);

    std::transform(std::istream_iterator<std::bitset<8> >(in),
                   std::istream_iterator<std::bitset<8> >(),
                   std::ostream_iterator<unsigned char>(out),
                   [](std::bitset<8> const &b) { return b.to_ulong();});
    return 0;
}

进行快速测试，在我的机器上处理大约6秒内的8000万字节的输入文件。除非您的文件比您在问题中提到的文件大得多，否则我的猜测是速度足够快，而且简单性很难被击败。

Answer 4

因此，不是在std::string之间来回转换，为什么不使用一堆机器字大小的整数来快速访问？

const size_t bufsz = 1000000;

uint32_t *buf = new uint32_t[bufsz];
memset(buf, 0xFA, sizeof(*buf) * bufsz);
std::ofstream ofile("foo.bin", std::ofstream::binary);

int i;
for (i = 0; i < bufsz; i++) {
    ofile << hex << setw(8) << setfill('0') << buf[i];
    // or if you want raw binary data instead of formatted hex:
    ofile.write(reinterpret_cast<char *>(&buf[i]), sizeof(buf[i]));
}

delete[] buf;

对我来说，这只需要几分之一秒。

Answer 5

与此完全不同的东西应该明显更快：

void
text_to_binary_file(const std::string& text, const char *fname)
{
    unsigned char wbuf[4096];  // 4k is a good size of "chunk to write to file"
    unsigned int i = 0, j = 0;
    std::filebuf fp;           // dropping down to filebufs may well be faster
                               // for this problem
    fp.open(fname, std::ios::out|std::ios::trunc);
    memset(wbuf, 0, 4096);

    for (std::string::iterator p = text.begin(); p != text.end(); p++) {
        wbuf[i] |= (1u << (CHAR_BIT - (j+1)));
        j++;
        if (j == CHAR_BIT) {
            j = 0;
            i++;
        }
        if (i == 4096) {
            if (fp.sputn(wbuf, 4096) != 4096)
                abort();
            memset(wbuf, 0, 4096);
            i = 0;
            j = 0;
        }
    }
    if (fp.sputn(wbuf, i+1) != i+1)
        abort();
    fp.close();
}

正确的错误处理留作练习。

Answer 6

即使很晚，我想把我的例子放在处理这样的字符串上。体系结构特定的优化可以使用未对齐的字符加载到多个寄存器中以并行地“压缩”这些位。这个未经测试的示例代码不会检查字符并避免对齐和字节序要求。它假定该二进制字符串的字符表示最高有效位的连续八位字节（字节），不字和双字等，其中它们在内存中的特定表示（以及在该字符串中）需要特殊处理才能携带。

//THIS CODE HAS NEVER BEEN TESTED! But I hope you get the idea.

//set up an ofstream with a 64KiB buffer
std::vector<char> buffer(65536);
std::ofstream ofs("out.bin", std::ofstream::binary|std::ofstream::out|std::ofstream::trunc);
ofs.rdbuf()->pubsetbuf(&buffer[0],buffer.size());

std::string::size_type bits = Bin.length();
std::string::const_iterator cIt = Bin.begin();

//You may treat cases, where (bits % 8 != 0) as error

//Initialize with the first iteration
uint8_t byte = uint8_t(*cIt++) - uint8_t('0');
byte <<= 1;
for(std::string::size_type i = 1;i < (bits & (~std::string::size_type(0x7)));++i,++cIt)
{
    if(i & 0x7) //bit 7 ... 1
    {
        byte |= uint8_t(*cIt) - uint8_t('0');
        byte <<= 1;
    }
    else //bit 0: write and advance to the the next most significant bit of an octet
    {
        byte |= uint8_t(*cIt) - uint8_t('0');
        ofs.put(byte);

        //advance
        ++i;
        ++cIt;
        byte = uint8_t(*cIt) - uint8_t('0');
        byte <<= 1;
    }
}

ofs.flush();

Answer 7

这使得一个76.2 MB（80,000,000字节）的文件1010101010101 ......

#include <stdio.h>
#include <iostream>
#include <fstream>

using namespace std;

int main( void )
{
    char Bin=0;
    ofstream myfile;
    myfile.open (".\\example.bin", ios::out | ios::app | ios::binary);
    int c=0;
    Bin = 0xAA;
    while( c!= 80000000 ){
        myfile.write(&Bin,1);
        c++;
    }
    myfile.close();
    cout << "Done!\n";
    return( 0 );
};

Here is the file first bytes

C ++编写大型二进制文件的任何更快的方法？

7 个答案: