如何从MS Word中的行号获取文本

时间:2012-02-07 17:42:09

标签: c# ms-word office-interop office-2003 office-automation

是否可以使用办公自动化从MS Word中的给定行号获取文本(行或句子)?我的意思是,如果我能得到给定行号中的文本或句子本身是该行的一部分,那就没关系。

我没有提供任何代码,因为我完全不知道如何使用办公自动化读取MS Word。我可以这样打开文件:

var wordApp = new ApplicationClass();
wordApp.Visible = false;
object file = path;
object misValue= Type.Missing; 
Word.Document doc = wordApp.Documents.Open(ref file, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue);

//and rest of the code given I have a line number = 3 ?

编辑:澄清@Richard Marskell - Drackir的疑问,虽然MS Word中的文字是一长串字符串,办公室自动化仍然让我们知道行号。事实上,我从另一段代码中获取行号,如下所示:

Word.Revision rev = //SomeRevision
object lineNo = rev.Range.get_Information(Word.WdInformation.wdFirstCharacterLineNumber);

例如,说Word文件如下所示:

fix grammatical or spelling errors

clarify meaning without changing it correct minor mistakes add related resources or links
always respect the original author

这里有4行。

3 个答案:

答案 0 :(得分:4)

幸运的是,经过一些史诗般的搜索,我得到了一个解决方案。

    object file = Path.GetDirectoryName(Application.ExecutablePath) + @"\Answer.doc";

    Word.Application wordObject = new Word.ApplicationClass();
    wordObject.Visible = false;

    object nullobject = Missing.Value;
    Word.Document docs = wordObject.Documents.Open
        (ref file, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject);

    String strLine;
    bool bolEOF = false;

    docs.Characters[1].Select();

    int index = 0;
    do
    {
        object unit = Word.WdUnits.wdLine;
        object count = 1;
        wordObject.Selection.MoveEnd(ref unit, ref count);

        strLine = wordObject.Selection.Text;
        richTextBox1.Text += ++index + " - " + strLine + "\r\n"; //for our understanding

        object direction = Word.WdCollapseDirection.wdCollapseEnd;
        wordObject.Selection.Collapse(ref direction);

        if (wordObject.Selection.Bookmarks.Exists(@"\EndOfDoc"))
            bolEOF = true;
    } while (!bolEOF);

    docs.Close(ref nullobject, ref nullobject, ref nullobject);
    wordObject.Quit(ref nullobject, ref nullobject, ref nullobject);
    docs = null;
    wordObject = null;

Here是代码背后的天才。请点击链接获取有关其工作原理的更多说明。

答案 1 :(得分:1)

如果要阅读标准文本.txt文件,请使用此选项 您可以通过一次调用来阅读文件

List<string> strmsWord = 
    new List<string>(File.ReadAllLines(yourFilePath+ YourwordDocName));

如果你想循环播放,看看返回的项目使用的是什么

 foreach (string strLines in strmsWord )
 {
   Console.WriteLine(strLines);
 }     

我完全忘了Word文档可能是二进制格式的,所以看看这个并将内容读入RichTextBox,然后你就可以得到你想要的行号,或者在单词后加载到列表中......链接会告诉你 Reading from a Word Doc 如果您想阅读单词Document的XML格式: 这里也是结帐的好链接 ReadXML Format of a Word Document

这个onne是一个更简单的示例,将内容读入ClipBoard Load Word into ClipBoard

答案 2 :(得分:0)

var word = new Word.Application();
object miss = Missing.Value;
object path = @"D:\viewstate.docx";
object readOnly = true;
var docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, 
                               ref miss, ref miss, ref miss, ref miss, ref miss, 
                               ref miss, ref miss, ref miss, ref miss, ref miss, 
                               ref miss, ref miss);
string totaltext = "";

object unit = Word.WdUnits.wdLine;
object count = 1;
word.Selection.MoveEnd(ref unit, ref count);
totaltext = word.Selection.Text;

TextBox1.Text = totaltext;
docs.Close(ref miss, ref miss, ref miss);
word.Quit(ref miss, ref miss, ref miss);
docs = null;
word = null;