我想通过字符串[BreakPage]
将RTF文件(使用C#或VB.Net)拆分为2个或更多部分。例如,我有一个包含[BreakPage]
的文件,需要分为两部分:
{\ RTF1 \ ANSI \ ansicpg1251 \ UC1 \ deff0 \ stshfdbch0 \ stshfloch0 \ stshfhich0 \ stshfbi0 \ deflang1049 \ deflangfe1049 {\ fonttbl {\ F0 \费勒曼\ fcharset204 \ fprq2 {* \潘糖 02020603050405020304} Times New Roman;} {\ f38 \ froman \ fcharset0 \ fprq2 Times New Roman;} {\ f36 \ froman \ fcharset238 \ fprq2 Times New Roman CE;} {\ f39 \ froman \ fcharset161 \ fprq2 Times New Roman 希腊语;} {\ f40 \ froman \ fcharset162 \ fprq2 Times New Roman Tur;} {\ f41 \ froman \ fcharset177 \ fprq2 Times New Roman(希伯来语);} {\ f42 \ froman \ fcharset178 \ fprq2 Times New Roman (阿拉伯语);} {\ f43 \ froman \ fcharset186 \ fprq2 Times New Roman 波罗的海;} {\ f44 \ froman \ fcharset163 \ fprq2 Times New Roman (越南);}} {\ colortbl; \ red0 \ green0 \ blue0; \ red0 \ green0 \ blue255; \ red0 \ green255 \ blue255; \ red0 \ green255 \ blue0; \ red255 \ green0 \ blue255; \ red255 \ green0 \ blue0; \ red255 \ green255 \ blue0; \ red255 \ green255 \ blue255; \ red0 \ green0 \ blue128; \ red0 \ green128 \ blue128; \ red0 \ green128 \ blue0; \ red128 \ green0 \ blue128; \ red128 \ green0 \ blue0; \ red128 \ green128 \ blue0; \ red128 \ green128 \ blue128; \ red192 \ green192 \ blue192;} {\样式表{\ QL \ li0 \ RI0 \ widctlpar \ aspalpha \ aspnum \ faauto \ adjustright \ rin0 \ LIN0 \ itap0 \ fs24 \ lang1049 \ langfe1049 \ cgrid \ langnp1049 \ langfenp1049 \ snext0 正常;} {* \ cs10 \ additive \ ssemihidden默认段落 字体;} {* \ TS11 \ tsrowd \ trftsWidthB3 \ trpaddl108 \ trpaddr108 \ trpaddfl3 \ trpaddft3 \ trpaddfb3 \ trpaddfr3 \ trcbpat1 \ trcfpat1 \ tscellwidthfts0 \ tsvertalt \ tsbrdrt \ tsbrdrl \ tsbrdrb \ tsbrdrr \ tsbrdrdgl \ tsbrdrdgr \ tsbrdrh \ tsbrdrv \ QL \ li0 \ RI0 \ widctlpar \ aspalpha \ aspnum \ faauto \ adjustright \ rin0 \ LIN0 \ itap0 \ fs20 \ lang1024 \ langfe1024 \ cgrid \ langnp1024 \ langfenp1024 \ snext11 \ ssemihidden正常 表;}} {* \ latentstyles \ lsdstimax156 \ lsdlockeddef0} {* \ rsidtbl \ rsid2111663 \ rsid7154806 \ rsid15558346} {* \ generator Microsoft Word 11.0.5604;} {\ info {\ author Programmer} {\ operator 程序员} {\ creatim \ yr2011 \ MO8 \ DY2 \ HR12 \ min45} {\ revtim \ yr2011 \ MO8 \ DY5 \ HR12 \ min34} {\版本3} {\ edmins1} {\ nofpages1} {\ nofwords5} {\ nofchars34} {\ nofcharsws38} {\ vern24689}} \ margl1701 \ margr850 \ margt1134 \ margb1134 \ widowctrl \ ftnbj \ aenddoc \ noxlattoyen \ expshrtn \ noultrlspc \ dntblnsbdb \ nospaceforul \ hyphcaps0 \ horzdoc \ dghspace120 \ dgvspace120 \ dghorigin1701 \ dgvorigin1984 \ dghshow0 \ dgvshow3 \ jcompress \ viewkind1 \ viewscale100 \ nolnhtadjtbl \ rsidroot15558346 \ fet0 \ sectd \ linex0 \ sectdefaultcl \ sftnbj {* \ pnseclvl1 \ pnucrm \ pnstart1 \ pnindent720 \ pnhang {\ pntxta 。}} {* \ pnseclvl2 \ pnucltr \ pnstart1 \ pnindent720 \ pnhang {\ pntxta 。}} {* \ pnseclvl3 \ pndec \ pnstart1 \ pnindent720 \ pnhang {\ pntxta 。}} {* \ pnseclvl4 \ pnlcltr \ pnstart1 \ pnindent720 \ pnhang {\ pntxta }}} {* \ pnseclvl5 \ pndec \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb(} {\ pntxta }}} {* \ pnseclvl6 \ pnlcltr \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb (} {\ pntxta)}} {* \ pnseclvl7 \ pnlcrm \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb(} {\ pntxta }}} {* \ pnseclvl8 \ pnlcltr \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb (} {\ pntxta)}} {* \ pnseclvl9 \ pnlcrm \ pnstart1 \ pnindent720 \ pnhang {\ pntxtb(} {\ pntxta)}} \ pard \ plain \ ql \ li0 \ RI0 \ nowidctlpar \ faauto \ rin0 \ LIN0 \ itap0 \ FS24 \ lang1049 \ langfe1049 \ CGRID \ langnp1049 \ langfenp1049 {\ b \ insrsid7154806 \ charrsid7154806第1行\ par} {\ insrsid7154806 \ par } {\ I \ insrsid7154806 \ charrsid7154806 Line3} {\ lang1048 \ langfe1049 \ langnp1048 \ insrsid7154806 \ par } {\ lang1048 \ langfe1049 \ langnp1048 \ insrsid2111663 [BreakPage] \ par } {\ insrsid7154806 Line4 \ par \ par Line5 \ par}}
任何人都可以帮助我吗?
谢谢!
答案 0 :(得分:5)
问题是RTF在全局标头中有一些(但不一定是全部)格式化信息。为了拆分RTF文本以使结果再次成为有效格式的RTF,您基本上需要知道标题信息的位置,并在分割中复制它。
有两种方法可以做到这一点:
(1)是可行的,但需要时间。幸运的是,RTF解析器已经存在,例如this one on CodeProject。
或者,您也可以将RTF文本加载到RichTextBox
,然后在"[BreakPage]"
内搜索拆分文本RichTextBox
,以编程方式选择第一个和第二个部分并检索RTF使用SelectedRtf
属性的文本。