如何在给定文档中查找字符串的位置或位置

时间:2010-02-26 15:10:23

标签: c# vb.net ms-word position

如何找到给定文档中字符串的位置或位置。我有一个word文档,我想将所有单词和单词位置存储在数据库中,这就是为什么我需要找到单词的位置。

所以请告诉我如何在给定文件中找到单词或字符串的位置或位置。

我打算使用vb.net或c#for和.doc文件

1 个答案:

答案 0 :(得分:1)

嗯......我还没找到更聪明的解决方案: - /但这可能对你有所帮助......我们假设您的系统中安装了某个版本的MS Office。

首先,您必须将项目中的引用添加到名为“Microsoft Word?* object library”的Microsoft COM组件中

*?它取决于您的MS Office版本

添加引用后,您可以测试以下代码:

using System;
using System.Collections.Generic;
using System.Text;
using Word;

namespace ConsoleApplication1
{
    class Program
    {

        static void Main(string[] args)
        {

            // Find the full path of our document

            System.IO.FileInfo ExecutableFileInfo = new System.IO.FileInfo(System.Reflection.Assembly.GetEntryAssembly().Location);            
            object docFileName = System.IO.Path.Combine(ExecutableFileInfo.DirectoryName, "document.doc");

            // Create the needed Word.Application and Word.Document objects

            object nullObject = System.Reflection.Missing.Value;
            Word.Application application = new Word.ApplicationClass();
            Word.Document document = application.Documents.Open(ref docFileName, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject);


            string wholeTextContent = document.Content.Text; 
            wholeTextContent = wholeTextContent.Replace('\r', ' '); // Delete lines between paragraphs
            string[] splittedTextContent = wholeTextContent.Split(' '); // Get the separate words

            int index = 1;
            foreach (string singleWord in splittedTextContent)
            {
                if (singleWord.Trim().Length > 0) // We don´t need to store white spaces
                {
                    Console.WriteLine("Word: " + singleWord + "(position: " + index.ToString() + ")");
                    index++;
                }
            }

            // Dispose Word.Application and Word.Document objects resources

            document.Close(ref nullObject, ref nullObject, ref nullObject);
            application.Quit(ref nullObject, ref nullObject, ref nullObject);
            document = null;
            application = null;

            Console.ReadLine(); 
        }
    }
}

我会测试它看起来它的工作原理=)