I've developed a simple data migration console tool. using C#, Linq and EF. This tool get all data I want to move from place A to B. The code is something like this:
var data = dataAccess.GetData()
Parallel.ForEach(data, currentdata =>
{
//Do some business, and insert data
});
As I know parallel foreach handles everythig in order to take the advantage of parallelism using all cores of the procesor and threads as possible in the most profitable way.
So I tried this tool with a huge amount of data, and migration process takes about 5h.
Then, I decided to try other idea.
I've generated 4 consoles.exe of this proyect, making a modification, now each one takes a quarter of data.
Eg: Total data is about 40 millions registers to migrate, console 1 migrates from 0 to 10M, console 2 from 10M to 20M, console 3 from 20M to 30M and console 4 from 30M to 40M Then I runned this consoles, one on each core of my processor, and guess what, it takes less than a half to migrate everything.
How could it be possible if supposedly parallel foreach should be the best aproach?
Any idea to replicate this improvement just with one console?
Thank you.
EDIT: Now I'm trying this, previously I chunked the data:
Process process = Process.GetCurrentProcess();
int cpuCount = Environment.ProcessorCount;
int offset = process.Threads.Count;
Thread[] threads = new Thread[cpuCount];
for (int i = 0; i < cpuCount; ++i)
{
Thread t = new Thread(new ThreadStart( migrateChunk))
{ IsBackground = true };
t.Start();
}
process.Refresh();
for (int i = 0; i < cpuCount; ++i)
{
process.Threads[i + offset].ProcessorAffinity = (IntPtr)(i+1);
}
Do you think is a good approach? Because I dont see any improvement from parallel Foreach. Even I tried to attach all proccesses to the same core but dont see any change. thanks
答案 0 :(得分:2)
问题在于
var backgroundColor = [
'#0db02bc7',
'#0daf87c7',
'#afaf0dc7',
'#0cb0aac7',
'#0c97b0c7'
];
据说要检索40密耳的数据需要4分钟,并且检索10密耳的数据需要1分钟,所以10密耳的控制台应用程序开始移动数据时40密耳仍然从数据库中检索数据
var data = dataAccess.GetData()
对于这部分,您可能想查看并行文档,基本上,并行将获取数据并分成小块并将数据传播到处理器进行处理。
1. Parallel
2. Parallel Partition of Work