我不得不将以RTF格式保存在数据库中的大量文本更改为纯文本。我正在使用方法described in this MSDN article但是我认为我发现了一个障碍(我认为它不在我的代码中,而是.NET框架本身)。
我有以下功能
//convert RTF text to plain text
public static string RtfTextToPlainText(string FormatObject)
{
System.Windows.Forms.RichTextBox rtfBox = new System.Windows.Forms.RichTextBox();
rtfBox.Rtf = FormatObject;
FormatObject = rtfBox.Text; //This is line 494 for later reference for the stack traces.
rtfBox.Dispose();
return FormatObject;
}
它应该是完全独立的,不会阻挡任何东西。我正在做的项目有几百万条需要处理的记录,所以我分批分解工作并使用任务进行并行处理。它仍然相当慢,所以我闯入代码并发现了这一点。
这是等待任务的调用堆栈
[In a sleep, wait, or join]
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.CreateHandle(System.Windows.Forms.CreateParams cp) + 0x242 bytes
System.Windows.Forms.dll!System.Windows.Forms.Control.CreateHandle() + 0x2b2 bytes
System.Windows.Forms.dll!System.Windows.Forms.TextBoxBase.CreateHandle() + 0x54 bytes
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.Rtf.set(string value) + 0x68 bytes
>CvtCore.dll!CvtCore.StandardFunctions.Str.RtfTextToPlainText(object Expression) Line 494 C#
这是线程816的调用堆栈
[Managed to Native Transition]
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DefWndProc(ref System.Windows.Forms.Message m) + 0x9e bytes
System.Windows.Forms.dll!System.Windows.Forms.Control.WmWindowPosChanged(ref System.Windows.Forms.Message m) + 0x39 bytes
System.Windows.Forms.dll!System.Windows.Forms.Control.WndProc(ref System.Windows.Forms.Message m) + 0x51b bytes
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.WndProc(ref System.Windows.Forms.Message m) + 0x5c bytes
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DebuggableCallback(System.IntPtr hWnd, int msg, System.IntPtr wparam, System.IntPtr lparam) + 0x15e bytes
[Native to Managed Transition]
[Managed to Native Transition]
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DefWndProc(ref System.Windows.Forms.Message m) + 0x9e bytes
System.Windows.Forms.dll!System.Windows.Forms.Control.WmCreate(ref System.Windows.Forms.Message m) + 0x1c bytes
System.Windows.Forms.dll!System.Windows.Forms.Control.WndProc(ref System.Windows.Forms.Message m) + 0x50b bytes
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.WndProc(ref System.Windows.Forms.Message m) + 0x5c bytes
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DebuggableCallback(System.IntPtr hWnd, int msg, System.IntPtr wparam, System.IntPtr lparam) + 0x15e bytes
[Native to Managed Transition]
[Managed to Native Transition]
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.CreateHandle(System.Windows.Forms.CreateParams cp) + 0x44c bytes
System.Windows.Forms.dll!System.Windows.Forms.Control.CreateHandle() + 0x2b2 bytes
System.Windows.Forms.dll!System.Windows.Forms.TextBoxBase.CreateHandle() + 0x54 bytes
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.Rtf.set(string value) + 0x68 bytes
>CvtCore.dll!CvtCore.StandardFunctions.Str.RtfTextToPlainText(object Expression) Line 494 C#
为什么第494行任务4上的任务2阻塞,它们是否应该完全相互独立?
注意
我在发布模式下抓住了这些堆栈跟踪和屏幕截图,我似乎无法在正确的时间点击暂停以在调试模式下发生同样的事情。这也可能是我缓慢的原因吗?分析器说我的程序花费了83.2%的时间在`System.Windows.Forms.RichTextBox.set_Rtf(string)(这是第494行调用的子函数)
有关如何加快剔除rtf格式化过程的任何建议都将非常感激。
P.S。
我目前正在重写它,所以每个线程都有一个文本框,不会被处理掉,而不是每次调用该函数时都会创建一个新的文本框,我希望它能加快速度,我会更新细节在我这样做之后。
更新
我解决了自己的问题(见下面的答案),但这是我开始任务的方式
//create start consumer threads
for (int i = 0; i < ThreadsPreProducer; i++)
{
//create worked and thread
WorkerObject NewWorkerObject = new WorkerObject(colSource, FormatObjectEvent, UpdateModule);
Task WorkerTask = new Task(NewWorkerObject.DoWork);
WorkerTasks.Add(WorkerTask);
WorkerTask.Start();
}
//create/start producer thread
ProducerObject NewProducerObject = new ProducerObject(colSource, SourceQuery, ConnectionString, PreProcessor, UpdateModule, RowNameIndex);
Task ProducerTask = new Task(NewProducerObject.DoWork);
WorkerTasks.Add(ProducerTask);
ProducerTask.Start();
//block while producer runs
ProducerTask.Wait();
//create post producer threads
for (int i = 0; i < ThreadsPostProducer; i++)
{
//create worked and thread
WorkerObject NewWorkerObject = new WorkerObject(colSource, FormatObjectEvent, UpdateModule);
Task WorkerTask = new Task(NewWorkerObject.DoWork);
WorkerTasks.Add(WorkerTask);
WorkerTask.Start();
}
//block until all tasks are done
Task.WaitAll(WorkerTasks.ToArray());
正在使用生产者/消费者模型,在我的例子中,1个生产者和4个消费者(2个开始时开始,2个生产者完成后开始,以便在系统资源从生产者中释放后加速工作)。
答案 0 :(得分:4)
将功能更改为
static ThreadLocal<RichTextBox> rtfBox = new ThreadLocal<RichTextBox>(() => new RichTextBox());
//convert RTF text to plain text
public static string RtfTextToPlainText(string FormatObject )
{
rtfBox.Value.Rtf = FormatObject;
FormatObject = rtfBox.Value.Text;
rtfBox.Value.Clear();
return FormatObject;
}
将我的运行时间从几分钟改为几秒钟。
我不处理这些物品,因为它们将用于整个程序的整个生命周期。