将多个DOCX文件附加在一起

时间:2008-10-29 17:22:26

标签: c# openxml docx

我需要以编程方式使用C#将几个预先存在的docx文件附加到单个长docx文件中 - 包括子弹和图像等特殊标记。页眉和页脚信息将被删除,因此这些信息不会导致任何问题。

我可以找到很多关于使用.NET Framework 3操作单个docx文件的信息,但是关于如何合并文件并不容易或明显。还有一个第三方程序(Acronis.Words)可以做到这一点,但它的成本非常高。

更新

已经建议通过Word进行自动化,但是我的代码将在IIS Web服务器上的ASP.NET上运行,因此对我而言,不能选择使用Word。很抱歉没有提到这一点。

8 个答案:

答案 0 :(得分:23)

尽管提交了所有好的建议和解决方案,我还是开发了另一种选择。在我看来,你应该完全避免在服务器应用程序中使用Word。所以我使用OpenXML,但它不适用于AltChunk。我将文本添加到原始主体,我收到了一个byte []列表而不是文件列表,但您可以轻松地根据需要更改代码。

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace OfficeMergeControl
{
    public class CombineDocs
    {
        public byte[] OpenAndCombine( IList<byte[]> documents )
        {
            MemoryStream mainStream = new MemoryStream();

            mainStream.Write(documents[0], 0, documents[0].Length);
            mainStream.Position = 0;

            int pointer = 1;
            byte[] ret;
            try
            {
                using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true))
                {

                    XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml);

                    for (pointer = 1; pointer < documents.Count; pointer++)
                    {
                        WordprocessingDocument tempDocument = WordprocessingDocument.Open(new MemoryStream(documents[pointer]), true);
                        XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);

                        newBody.Add(tempBody);
                        mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
                        mainDocument.MainDocumentPart.Document.Save();
                        mainDocument.Package.Flush();
                    }
                }
            }
            catch (OpenXmlPackageException oxmle)
            {
                throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), oxmle);
            }
            catch (Exception e)
            {
                throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), e);
            }
            finally
            {
                ret = mainStream.ToArray();
                mainStream.Close();
                mainStream.Dispose();
            }
            return (ret);
        }
    }
}

我希望这会对你有所帮助。

答案 1 :(得分:7)

您不需要使用自动化。 DOCX文件基于OpenXML格式。它们只是zip文件,里面有一堆XML和二进制部分(思考文件)。您可以使用Packaging API(WindowsBase.dll中的System.IO.Packaging)打开它们,并使用Framework中的任何XML类对它们进行操作。

查看OpenXMLDeveloper.org了解详情。

答案 2 :(得分:5)

这是一个非常晚的原始问题,并且有很多变化,但我想我会分享我编写合并逻辑的方式。这使用了Open XML Power Tools

public byte[] CreateDocument(IList<byte[]> documentsToMerge)
{
    List<Source> documentBuilderSources = new List<Source>();
    foreach (byte[] documentByteArray in documentsToMerge)
    {
        documentBuilderSources.Add(new Source(new WmlDocument(string.Empty, documentByteArray), false));
    }

    WmlDocument mergedDocument = DocumentBuilder.BuildDocument(documentBuilderSources);
    return mergedDocument.DocumentByteArray;
}

目前,这在我们的应用程序中运行良好。我稍微更改了代码,因为我的要求是每个需要首先处理的文档。所以传入的是具有模板字节数组的DTO对象以及需要替换的各种值。这是我的代码目前的样子。这会使代码更进一步。

public byte[] CreateDocument(IList<DocumentSection> documentTemplates)
{
    List<Source> documentBuilderSources = new List<Source>();
    foreach (DocumentSection documentTemplate in documentTemplates.OrderBy(dt => dt.Rank))
    {
        // Take the template replace the items and then push it into the chunk
        using (MemoryStream templateStream = new MemoryStream())
        {
            templateStream.Write(documentTemplate.Template, 0, documentTemplate.Template.Length);

            this.ProcessOpenXMLDocument(templateStream, documentTemplate.Fields);

            documentBuilderSources.Add(new Source(new WmlDocument(string.Empty, templateStream.ToArray()), false));
        }
    }

    WmlDocument mergedDocument = DocumentBuilder.BuildDocument(documentBuilderSources);
    return mergedDocument.DocumentByteArray;
}

答案 3 :(得分:2)

我刚才写了一个小测试应用来做这件事。我的测试应用程序使用Word 2003文档(.doc)而不是.docx,但我想这个过程是一样的 - 我认为你必须改变的是使用更新版本的Primary Interop Assembly。使用新的C#4.0功能,这段代码看起来更加整洁......

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using Microsoft.Office.Interop.Word;
using Microsoft.Office.Core;
using System.Runtime.InteropServices;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            new Program().Start();
        }

        private void Start()
        {
            object fileName = Path.Combine(Environment.CurrentDirectory, @"NewDocument.doc");
            File.Delete(fileName.ToString());

            try
            {
                WordApplication = new ApplicationClass();
                var doc = WordApplication.Documents.Add(ref missing, ref missing, ref missing, ref missing);
                try
                {
                    doc.Activate();

                    AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc1.doc", doc, false);
                    AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc2.doc", doc, true);

                    doc.SaveAs(ref fileName,
                        ref missing, ref missing, ref missing, ref missing,     ref missing,
                        ref missing, ref missing, ref missing, ref missing, ref missing,
                        ref missing, ref missing, ref missing, ref missing, ref missing);
                }
                finally
                {
                    doc.Close(ref missing, ref missing, ref missing);
                }
            }
            finally
            {
                WordApplication.Quit(ref missing, ref missing, ref missing);
            }
        }

        private void AddDocument(string path, Document doc, bool lastDocument)
        {
            object subDocPath = path;
            var subDoc = WordApplication.Documents.Open(ref subDocPath, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing);
            try
            {

                object docStart = doc.Content.End - 1;
                object docEnd = doc.Content.End;

                object start = subDoc.Content.Start;
                object end = subDoc.Content.End;

                Range rng = doc.Range(ref docStart, ref docEnd);
                rng.FormattedText = subDoc.Range(ref start, ref end);

                if (!lastDocument)
                {
                    InsertPageBreak(doc);
                }
            }
            finally
            {
                subDoc.Close(ref missing, ref missing, ref missing);
            }
        }

        private static void InsertPageBreak(Document doc)
        {
            object docStart = doc.Content.End - 1;
            object docEnd = doc.Content.End;
            Range rng = doc.Range(ref docStart, ref docEnd);

            object pageBreak = WdBreakType.wdPageBreak;
            rng.InsertBreak(ref pageBreak);
        }

        private ApplicationClass WordApplication { get; set; }

        private object missing = Type.Missing;
    }
}

答案 4 :(得分:2)

您想使用AltChunks和OpenXml SDK 1.0(如果可以,至少为2.0)。查看Eric White的博客了解更多详情,同时也是一个很好的资源!这是一个代码示例,如果不能立即开始工作,应该可以帮助您。

public void AddAltChunkPart(Stream parentStream, Stream altStream, string altChunkId)
{
    //make sure we are at the start of the stream    
    parentStream.Position = 0;
    altStream.Position = 0;
    //push the parentStream into a WordProcessing Document
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(parentStream, true))
    {
        //get the main document part
        MainDocumentPart mainPart = wordDoc.MainDocumentPart;
        //create an altChunk part by adding a part to the main document part
        AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(altChunkPartType, altChunkId);
        //feed the altChunk stream into the chunk part
        chunk.FeedData(altStream);
        //create and XElement to represent the new chunk in the document
        XElement newChunk = new XElement(altChunk, new XAttribute(relId, altChunkId));
        //Add the chunk to the end of the document (search to last paragraph in body and add at the end)
        wordDoc.MainDocumentPart.GetXDocument().Root.Element(body).Elements(paragraph).Last().AddAfterSelf(newChunk);
        //Finally, save the document
        wordDoc.MainDocumentPart.PutXDocument();
    }
    //reset position of parent stream
    parentStream.Position = 0;
}

答案 5 :(得分:0)

它的退出复杂,所以代码超出了论坛帖子的范围,我会为你编写你的应用程序,但总结一下。

  • 将这两个文档打开为包
  • 遍历第二个docuemnt的部分寻找图像和嵌入的东西
  • 将这些部分添加到第一个包中,记住新的关系ID(这涉及很多流工作)
  • 打开第二个文档中的document.xml部分并将所有旧关系ID替换为新关系ID-将第二个document.xml的所有子节点(而不是根节点)附加到第一个document.xml < / LI>
  • 保存所有XmlDocuments并刷新包

答案 6 :(得分:0)

我在C#中创建了一个应用程序,将RTF文件合并到一个文档中,我希望它也适用于DOC和DOCX文件。

    Word._Application wordApp;
    Word._Document wordDoc;
    object outputFile = outputFileName;
    object missing = System.Type.Missing;
    object vk_false = false;
    object defaultTemplate = defaultWordDocumentTemplate;
    object pageBreak = Word.WdBreakType.wdPageBreak;
    string[] filesToMerge = new string[pageCounter];
    filestoDelete = new string[pageCounter];

    for (int i = 0; i < pageCounter; i++)
    {
        filesToMerge[i] = @"C:\temp\temp" + i.ToString() + ".rtf";
        filestoDelete[i] = @"C:\temp\temp" + i.ToString() + ".rtf";                
    }
    try
    {
        wordDoc = wordApp.Documents.Add(ref missing, ref missing, ref missing, ref missing);
    }
    catch(Exception ex)
    {
        Console.WriteLine(ex.Message);
    }
    Word.Selection selection= wordApp.Selection;

    foreach (string file in filesToMerge)
    {
        selection.InsertFile(file,
            ref missing,
            ref missing,
            ref missing,
            ref missing);

        selection.InsertBreak(ref pageBreak);                                     
    }
    wordDoc.SaveAs(ref outputFile, ref missing, ref missing, ref missing, ref missing, ref missing,
           ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
           ref missing, ref missing);

希望这有帮助!

答案 7 :(得分:0)

对于任何想要使用文件名列表的人:

void AppendToExistingFile(string existingFile, IList<string> filenames)
{
    using (WordprocessingDocument document = WordprocessingDocument.Open(existingFile, true))
    {
        MainDocumentPart mainPart = document.MainDocumentPart;

        for (int i = filenames.Count - 1; i >= 0; --i)
        {
            string altChunkId = "AltChunkId" + i;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);

            using (FileStream fileStream = File.Open(filenames[i], FileMode.Open))
            {
                chunk.FeedData(fileStream);
            }

            AltChunk altChunk = new AltChunk { Id = altChunkId };
            mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
        }

        mainPart.Document.Save();
    }
}