我在网上看了看,我可以找到一些东西,但不是我想要的东西。
我有一个如下所示的SQL文件:
adb kill-server
adb start-server
{0}和{1}是值。 我只能在phpMyAdmin上传最多50MiB,所以我需要拆分这个文件。 我在线发现了一些东西,但是他们将文件精确地分成50MiB,而没有让线路结束。
所以我想要的是:
将文件拆分为48 - 49MiB文件,最后一行为INSERT INTO `cola`(`url`, `page`, `c_id`) VALUES
(`{0}`, `{1}`, 0),
(`{0}`, `{1}`, 0),
(`{0}`, `{1}`, 0), ... 300 times
(`{0}`, `{1}`, 0);
INSERT INTO...
{0} (
{1} ,
,下一个文件以
, 0);
我现在拥有的:
INSERT INTO `cola`(`url`, `page`, `c_id`) VALUES
这很有效,但速度很慢。有什么想法让它更快?
我要拆分的文件是4GB。
答案 0 :(得分:1)
几年前我有一个类似的项目。我处理它的方式是引入X大小的块(在你的情况下为49MB),然后向后扫描(使用String.LastIndexOf)作为最后一个键的开头(在你的情况下将是“Insert Into”。一切都是String.LastIndexOf结果的左侧保存在一个文件中,然后该字符串的其余部分被添加到我加载的下一个XY(49MB - 剩余字符串大小)字节中。
答案 1 :(得分:0)
伪代码来了:
open main file
n=1
open chunk[n]
while not !eof main file
{
read line from main file
if chunk stream . position+size of line < chunksize
write line to chunk
else
{
close chunk
n+1
open new chunk
write line to new chunk
}
}
close chunk
close main file
现在你的文件整齐排列。
答案 2 :(得分:0)
这样的事情应该有效。如果您不仅仅在脚本中插入语句,则可能需要使用。
var filename = "outfile.sql";
var spliton = "INSERT INTO";
var expectedEnd = ";";
var outcount = 0;
var filecounter = 0;
var outfileformatter = Path.GetFileNameWithoutExtension(filename) + "_{0}" +
Path.GetExtension(filename);
string outfile = null;
StreamWriter writer = null;
var blocksize = 32 * 1024;
var block = new char[blocksize];
// by using StreamReader you won't have to load the entire file into memory
using (var reader = new StreamReader(filename))
{
while (!reader.EndOfStream)
{
// read in sections of the file at a time since you can't hold the entire thing in memory.
var outsize = reader.ReadBlock(block, 0, blocksize);
var content = new string(block, 0, outsize);
// split the data by your seperator.
var chunks = content.Split(new[] { spliton }, StringSplitOptions.RemoveEmptyEntries)
.Select(c => spliton + c);
// loop over the chunks of data
foreach (var chunk in chunks)
{
//once the threshold is tripped close the writer and open the next
if (outcount > 48 * 1024 * 1024 || outfile == null) //48MB -
{
if (expectedEnd != null && !chunk.TrimEnd().TrimEnd('\n', '\r').EndsWith(expectedEnd))
{
//Console.WriteLine("+++extend");
}
else
{
filecounter++;
outcount = 0;
if (writer != null)
writer.Close();
Console.WriteLine(outfile);
outfile = string.Format(outfileformatter, filecounter);
writer = new StreamWriter(outfile);
}
}
//output the data
writer.Write(chunk);
//record how much data you wrote to the file.
outcount += Encoding.UTF8.GetBytes(chunk).Length;
//if the file is only ascii you could cheat and just say 'chunk.Length'.
}
}
}
if (writer != null)
writer.Close();
...正如所写,这并没有解析SQL。如果您不仅仅是插入语句,或者出于某些疯狂的原因,插入语句超过48MB,您可能会遇到此拆分代码的问题。但是,您始终可以确保写入文件的最后一个语句以分号;
结尾,或者修改解析/拆分逻辑以满足您的需要。