如何找到给定文档中字符串的位置或位置。我有一个word文档,我想将所有单词和单词位置存储在数据库中,这就是为什么我需要找到单词的位置。
所以请告诉我如何在给定文件中找到单词或字符串的位置或位置。
我打算使用vb.net或c#for和.doc文件
答案 0 :(得分:1)
嗯......我还没找到更聪明的解决方案: - /但这可能对你有所帮助......我们假设您的系统中安装了某个版本的MS Office。
首先,您必须将项目中的引用添加到名为“Microsoft Word?* object library”的Microsoft COM组件中
*?它取决于您的MS Office版本
添加引用后,您可以测试以下代码:
using System;
using System.Collections.Generic;
using System.Text;
using Word;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
// Find the full path of our document
System.IO.FileInfo ExecutableFileInfo = new System.IO.FileInfo(System.Reflection.Assembly.GetEntryAssembly().Location);
object docFileName = System.IO.Path.Combine(ExecutableFileInfo.DirectoryName, "document.doc");
// Create the needed Word.Application and Word.Document objects
object nullObject = System.Reflection.Missing.Value;
Word.Application application = new Word.ApplicationClass();
Word.Document document = application.Documents.Open(ref docFileName, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject);
string wholeTextContent = document.Content.Text;
wholeTextContent = wholeTextContent.Replace('\r', ' '); // Delete lines between paragraphs
string[] splittedTextContent = wholeTextContent.Split(' '); // Get the separate words
int index = 1;
foreach (string singleWord in splittedTextContent)
{
if (singleWord.Trim().Length > 0) // We don´t need to store white spaces
{
Console.WriteLine("Word: " + singleWord + "(position: " + index.ToString() + ")");
index++;
}
}
// Dispose Word.Application and Word.Document objects resources
document.Close(ref nullObject, ref nullObject, ref nullObject);
application.Quit(ref nullObject, ref nullObject, ref nullObject);
document = null;
application = null;
Console.ReadLine();
}
}
}
我会测试它看起来它的工作原理=)