我想从C#代码中读取.docx文件中的数据 - 如字符串。我查看了一些问题,但不明白使用哪一个。
我正在尝试使用ApplicationClass Application = new ApplicationClass();
,但我得到了
错误:
“Microsoft.Office.Interop.Word.ApplicationClass”类型没有 构造函数定义
我想从我的docx文件中获取全文,而不是单独的单词!
foreach (FileInfo f in docFiles)
{
Application wo = new Application();
object nullobj = Missing.Value;
object file = f.FullName;
Document doc = wo.Documents.Open(ref file, .... . . ref nullobj);
doc.Activate();
doc. == ??
}
我想知道如何从docx文件中获取全文?
答案 0 :(得分:3)
试
Word.Application interface instead of ApplicationClass.
Understanding Office Primary Interop Assembly Classes and Interfaces
答案 1 :(得分:3)
这就是我想从docx文件中提取整个文本的内容!
using (ZipFile zip = ZipFile.Read(filename))
{
MemoryStream stream = new MemoryStream();
zip.Extract(@"word/document.xml", stream);
stream.Seek(0, SeekOrigin.Begin);
XmlDocument xmldoc = new XmlDocument();
xmldoc.Load(stream);
string PlainTextContent = xmldoc.DocumentElement.InnerText;
}
答案 2 :(得分:1)
首先,您需要从程序集中添加一些引用,例如:
System.Xml
System.IO.Compression.FileSystem
第二,您应该确定要在课堂上使用这些函数来调用它们:
using System.IO;
using System.IO.Compression;
using System.Xml;
然后您可以使用以下代码:
public string DocxToString(string docxPath)
{
// Destination of your extraction directory
string extractDir = Path.GetDirectoryName(docxPath) + "\\" + Path.GetFileName(docxPath) + ".tmp";
// Delete old extraction directory
if (Directory.Exists(extractDir)) Directory.Delete(extractDir, true);
// Extract all of media an xml document in your destination directory
ZipFile.ExtractToDirectory(docxPath, extractDir);
XmlDocument xmldoc = new XmlDocument();
// Load XML file contains all of your document text from the extracted XML file
xmldoc.Load(extractDir + "\\word\\document.xml");
// Delete extraction directory
Directory.Delete(extractDir, true);
// Read all text of your document from the XML
return xmldoc.DocumentElement.InnerText;
}
享受...
答案 3 :(得分:0)
作为以“x”结尾的其他Microsoft Office文件的.docx格式只是一个可以打开/修改/压缩的ZIP包。
因此,请使用像this这样的Office Open XML库。
答案 4 :(得分:0)
享受。
确保您使用的是.Net Framework 4.5。
using NUnit.Framework;
[TestFixture]
public class GetDocxInnerTextTestFixture
{
private string _inputFilepath = @"../../TestFixtures/TestFiles/input.docx";
[Test]
public void GetDocxInnerText()
{
string documentText = DocxInnerTextReader.GetDocxInnerText(_inputFilepath);
Assert.IsNotNull(documentText);
Assert.IsTrue(documentText.Length > 0);
}
}
using System.IO;
using System.IO.Compression;
using System.Xml;
public static class DocxInnerTextReader
{
public static string GetDocxInnerText(string docxFilepath)
{
string folder = Path.GetDirectoryName(docxFilepath);
string extractionFolder = folder + "\\extraction";
if (Directory.Exists(extractionFolder))
Directory.Delete(extractionFolder, true);
ZipFile.ExtractToDirectory(docxFilepath, extractionFolder);
string xmlFilepath = extractionFolder + "\\word\\document.xml";
var xmldoc = new XmlDocument();
xmldoc.Load(xmlFilepath);
return xmldoc.DocumentElement.InnerText;
}
}