使用C#中的Interop将word文档转换为文本时忽略图像

时间:2018-05-15 17:37:20

标签: c# office-interop docx

目前我的代码成功地将word文档(.docx)中的所有文本转换为.txt文件中的纯文本,但是在word文档中有图像的地方,它在我的输出文件中被替换为'/'。我怎么能忽略这些图像?

我的代码存根:

Word.Application app = new Word.Application();
Word.Document doc;
object missing = Type.Missing;
object readOnly = true;
doc = app.Documents.Open(ref path, ref missing, ref readOnly, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); 
string text = doc.Content.Text;
System.IO.File.WriteAllText(txtPath, text);
Console.WriteLine("File converted to .txt!");

3 个答案:

答案 0 :(得分:1)

如何在访问内容之前删除所有图片?

这样的事情:

while (doc.InlineShapes.Count > 0)
{
    doc.InlineShapes(1).Delete();  //Collection is 1-based, first element is 1, not 0; at least when using it within VBA (weird language...)

}
// and with Shapes as well
while (doc.Shapes.Count > 0)
{
    doc.Shapes(1).Delete();
}

string text = doc.Content.Text;

答案 1 :(得分:1)

一种不同的方法,而不是我上面建议的方法

只需将文档另存为文本

object path = txtPath; 
const int wdFormatText = 2;
object fileFormat = wdFormatText;
doc.SaveAs (ref path, ref fileFormat, ref missing, ...) // other missing parameter

还有一种方法SaveAs2,如果你继续传递missing以获取更多参数,我认为它是相同的

答案 2 :(得分:0)

这是我的解决方案。此类读取Word文档,删除所有图像,然后将其转换为RTF文件。

using Microsoft.Office.Interop.Word;
using System.IO;
using OW = Microsoft.Office.Interop.Word;

namespace WordImagesCruncher
{
    public class WordImagesCruncher
    {
        public string SourceFilePath { private set; get; }

        public WordImagesCruncher(string sourceFilePath)
        {
            SourceFilePath = sourceFilePath;
        }

        public void DoWork()
        { 
            var wordApp = new OW.Application();
            OW.Document doc = wordApp.Documents.Open(SourceFilePath);

            for (int i = doc.InlineShapes.Count; i > 0; i--)
            {
                doc.InlineShapes[1].Delete();
            } 

            for (int i = doc.Shapes.Count; i >0 ; i--)
            {
                doc.Shapes[1].Delete();
            }

            doc.SaveAs(Path.GetFileNameWithoutExtension(SourceFilePath) + "_tmp.rtf", OW.WdSaveFormat.wdFormatRTF);
            doc.Close();
            wordApp.Quit();
        }
    }
}

别忘了添加Interop参考

enter image description here

并将属性Embed Interop Types更改为Yes