c# - 将大文件拆分为多个文件

时间:2016-07-19 13:54:09

标签: c# sql

我在网上看了看,我可以找到一些东西,但不是我想要的东西。

我有一个如下所示的SQL文件:

adb kill-server
adb start-server

{0}和{1}是值。 我只能在phpMyAdmin上传最多50MiB,所以我需要拆分这个文件。 我在线发现了一些东西,但是他们将文件精确地分成50MiB,而没有让线路结束。

所以我想要的是: 将文件拆分为48 - 49MiB文件,最后一行为INSERT INTO `cola`(`url`, `page`, `c_id`) VALUES (`{0}`, `{1}`, 0), (`{0}`, `{1}`, 0), (`{0}`, `{1}`, 0), ... 300 times (`{0}`, `{1}`, 0); INSERT INTO... {0} ( {1} ,,下一个文件以

开头
, 0);

我现在拥有的:

INSERT INTO `cola`(`url`, `page`, `c_id`) VALUES

这很有效,但速度很慢。有什么想法让它更快?

我要拆分的文件是4GB。

3 个答案:

答案 0 :(得分:1)

几年前我有一个类似的项目。我处理它的方式是引入X大小的块(在你的情况下为49MB),然后向后扫描(使用String.LastIndexOf)作为最后一个键的开头(在你的情况下将是“Insert Into”。一切都是String.LastIndexOf结果的左侧保存在一个文件中,然后该字符串的其余部分被添加到我加载的下一个XY(49MB - 剩余字符串大小)字节中。

答案 1 :(得分:0)

伪代码来了:

open main file
n=1
open chunk[n]
while not !eof main file
{
  read line from main file
  if chunk stream . position+size of line < chunksize 
    write line to chunk
  else
  {
    close chunk
    n+1
    open new chunk
    write line to new chunk
  }
}
close chunk
close main file

现在你的文件整齐排列。

答案 2 :(得分:0)

这样的事情应该有效。如果您不仅仅在脚本中插入语句,则可能需要使用。

var filename = "outfile.sql";
var spliton = "INSERT INTO";
var expectedEnd = ";";

var outcount = 0;
var filecounter = 0;
var outfileformatter = Path.GetFileNameWithoutExtension(filename) + "_{0}" +
                        Path.GetExtension(filename);

string outfile = null;
StreamWriter writer = null;
var blocksize = 32 * 1024;
var block = new char[blocksize];
// by using StreamReader you won't have to load the entire file into memory
using (var reader = new StreamReader(filename))
{
    while (!reader.EndOfStream)
    {
        // read in sections of the file at a time since you can't hold the entire thing in memory.
        var outsize = reader.ReadBlock(block, 0, blocksize);
        var content = new string(block, 0, outsize);

        // split the data by your seperator.
        var chunks = content.Split(new[] { spliton }, StringSplitOptions.RemoveEmptyEntries)
                            .Select(c => spliton + c);

        // loop over the chunks of data 
        foreach (var chunk in chunks)
        {
                //once the threshold is tripped close the writer and open the next
                if (outcount > 48 * 1024 * 1024 || outfile == null) //48MB - 
                {
                    if (expectedEnd != null && !chunk.TrimEnd().TrimEnd('\n', '\r').EndsWith(expectedEnd))
                    {
                        //Console.WriteLine("+++extend");
                    }
                    else
                    {
                        filecounter++;
                        outcount = 0;
                        if (writer != null)
                            writer.Close();
                        Console.WriteLine(outfile);
                        outfile = string.Format(outfileformatter, filecounter);
                        writer = new StreamWriter(outfile);
                    }
                }
            //output the data
            writer.Write(chunk);
            //record how much data you wrote to the file.
            outcount += Encoding.UTF8.GetBytes(chunk).Length;
            //if the file is only ascii you could cheat and just say 'chunk.Length'.
        }

    }
}
if (writer != null)
    writer.Close();

...正如所写,这并没有解析SQL。如果您不仅仅是插入语句,或者出于某些疯狂的原因,插入语句超过48MB,您可能会遇到此拆分代码的问题。但是,您始终可以确保写入文件的最后一个语句以分号;结尾,或者修改解析/拆分逻辑以满足您的需要。