我能扫描文档并提取某些单词吗?

时间:2012-12-26 22:54:52

标签: c# ms-word

我有一个word文档,其标签用“[[]]”表示,例如[[sqlscript1]]。我想扫描文档并在文本框中显示sqlscript1。我能只阅读包含[[]]的单词吗?

3 个答案:

答案 0 :(得分:1)

正如millimoose所说,OpenXML SDK正是您所寻求的。我们在动态powerpoint幻灯片的文档生成中做了类似的事情。 SDK使您能够对相关文档的对象模型进行编程处理,并根据需要更改/搜索/操作它。

答案 1 :(得分:1)

首先,load the content of the Word document进入内存。其次,使用regular expressions查找由双方括号表示的标记(必需模式:"\[\[(?<tag>[^\]]*)\]\]")。

答案 2 :(得分:1)

您需要使用Interop-DLL从Word文档中提取文本。 看看这个:http://msdn.microsoft.com/en-US/library/ms173188(v=vs.80).aspx

然后用以下内容读取文件:

object file = Path.GetDirectoryName(Application.ExecutablePath) + @"\Answer.doc";

Word.Application wordObject = new Word.ApplicationClass();
wordObject.Visible = false;

object nullobject = Missing.Value;
Word.Document docs = wordObject.Documents.Open
    (ref file, ref nullobject, ref nullobject, ref nullobject,
    ref nullobject, ref nullobject, ref nullobject, ref nullobject,
    ref nullobject, ref nullobject, ref nullobject, ref nullobject,
    ref nullobject, ref nullobject, ref nullobject, ref nullobject);

String strLine;
bool bolEOF = false;

docs.Characters[1].Select();

int index = 0;
do
{
    object unit = Word.WdUnits.wdLine;
    object count = 1;
    wordObject.Selection.MoveEnd(ref unit, ref count);

    strLine = wordObject.Selection.Text;
    richTextBox1.Text += ++index + " - " + strLine + "\r\n"; //for our understanding

    object direction = Word.WdCollapseDirection.wdCollapseEnd;
    wordObject.Selection.Collapse(ref direction);

    if (wordObject.Selection.Bookmarks.Exists(@"\EndOfDoc"))
        bolEOF = true;
} while (!bolEOF);

docs.Close(ref nullobject, ref nullobject, ref nullobject);
wordObject.Quit(ref nullobject, ref nullobject, ref nullobject);
docs = null;
wordObject = null;

来源:is there a way to read a word document line by line

现在将每一行复制到变量中并使用此Regex-Command检查您的模式:

Regex.Match(MYTEXT, @"\[[([^)]*)\]]").Groups[1].Value