如何通过ActiveX接口将MATLAB中的Unicode文本发送到Word文档中?

时间:2015-05-08 21:12:39

标签: matlab unicode ms-word activex

我使用MATLAB以编程方式在Windows上创建Microsoft Word文档。通常,此解决方案工作正常,但它与非ASCII文本有问题。例如,请使用以下代码:

wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
selection = wordApplication.Selection;
umbrella = char(9730);
disp(umbrella)
selection.TypeText(umbrella)

命令窗口正确显示伞形字符,但Word文档中的字符是框中的"问号"缺少字符符号。我可以将命令窗口中的字符剪切并粘贴到Word中,这样该字符确实可以使用该字体。

TypeText方法必须假设为ASCII。有关如何为其他语言的类似操作设置Unicode标志的资源,但我不知道如何将它们转换为我在MATLAB中可用的语法。

澄清:我的用例是发送一个未知的Unicode字符串(char数组),而不仅仅是一个字符。能够一次发送所有内容是理想的。这是更好的示例代码:

% Define a string to send with a non-ASCII character.
umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
disp(toSend)

% Open a new Word document.
wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;

% Send the text.
selection = wordApplication.Selection;
selection.TypeText(toSend)

我希望我可以简单地设置文档本身的encoding,但这似乎没有帮助:

wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
disp(wordApplication.ActiveDocument.TextEncoding)
wordApplication.ActiveDocument.TextEncoding = 65001;
disp(wordApplication.ActiveDocument.TextEncoding)
selection = wordApplication.Selection;
toSend = sprintf('Have you seen my \23002?');
selection.TypeText(toSend)

2 个答案:

答案 0 :(得分:9)

方法1.对单个字符(原始问题)

有效

取自here

umbrella = 9730; %// Unicode number of the desired character
selection.InsertSymbol(umbrella, '', true); %// true means use Unicode

第二个参数指定字体(因此您可以使用'Arial'等),而''显然意味着使用当前字体。第三个参数'true'表示使用Unicode。

方法2.对单个字符(原始问题)

有效

一种不那么直接的方式,取自here

umbrella = 9730; %// Unicode number of the desired character
selection.TypeText(dec2hex(umbrella));
selection.ToggleCharacterCode;

方法3.对字符串(已编辑的问题)

有效

如果您不介意使用剪贴板,则可以立即使用字符串:

umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
clipboard('copy', toSend); %// copy the Unicode string contained in variable `toSend`
selection.Paste %// paste it onto the Word document

答案 1 :(得分:4)

我也尝试了这个,并且遇到了您报告的相同问题(我使用MATLAB R2015a和Office 2013进行了测试)......

我认为MATLAB和Word之间的COM层中的某些东西搞乱了文本编码。

为了确认这确实是MATLAB中的一个错误,我在Python中尝试了同样的方法,它运行良好:

#!/usr/bin/env python

import os
import win32com.client

word = win32com.client.Dispatch("Word.Application")
word.Visible = True

doc = word.Documents.Add()

str = u"Have you seen my " + unichr(9730) + u"?"
word.Selection.TypeText(str)

fname = os.path.join(os.getcwd(), "out.docx")
doc.SaveAs2(fname)
doc.Close()

word.Quit()

我想出了两个MATLAB的解决方法:

方法1(首选):

这个想法是创建一个使用Office Interop的.NET程序集。它将接收任何Unicode字符串并将其写入某些指定的Word文档。 然后可以将此程序集加载到MATLAB中,并用作MS Office的包装程序。

C#中的示例:

<强> MSWord.cs

using System;
using Microsoft.Office.Interop.Word;

namespace MyOfficeInterop
{
    public class MSWord
    {
        // this is very basic, but you can expose anything you want!
        public void AppendTextToDocument(string filename, string str)
        {
            Application app = null;
            Document doc = null;
            try
            {
                app = new Application();
                doc = app.Documents.Open(filename);

                app.Selection.TypeText(str);
                app.Selection.TypeParagraph();

                doc.Save();
            }
            catch (Exception)
            {
                throw;
            }
            finally
            {
                doc.Close();
                app.Quit();
            }
        }
    }
}

我们首先编译它:

csc.exe /nologo /target:library /out:MyOfficeInterop.dll /reference:"C:\Program Files (x86)\Microsoft Visual Studio 12.0\Visual Studio Tools for Office\PIA\Office15\Microsoft.Office.Interop.Word.dll" MSWord.cs

然后我们从MATLAB测试它:

%// load assembly
NET.addAssembly('C:\path\to\MyOfficeInterop.dll')

%// I am assuming the document file already exists
fname = fullfile(pwd,'test.docx');
fclose(fopen(fname,'w'));

%// some text
str = ['Have you seen my ' char(9730) '?'];

%// add text to Word document
word = MyOfficeInterop.MSWord();
word.AppendTextToDocument(fname, str);

方法2:

这更像是一个黑客!我们只需将MATLAB中的文本直接写入文本文件(正确编码)。然后我们使用COM / ActiveX接口在MS Word中打开它,并将其重新保存为正确的.docx Word文档。

示例:

%// params
fnameTXT = fullfile(pwd,'test.txt');
fnameDOCX = fullfile(pwd,'test.docx');
str = ['Have you seen my ' char(9730) '?'];

%// create UTF-8 encoded text file
bytes = unicode2native(str, 'UTF-8');
fid = fopen(fnameTXT, 'wb');
fwrite(fid, bytes);
fclose(fid);

%// some office interop constants (extracted using IL DASM)
msoEncodingUTF8 = int32(hex2dec('0000FDE9'));         % MsoEncoding
wdOpenFormatUnicodeText = int32(hex2dec('00000005')); % WdOpenFormat
wdFormatDocumentDefault = int32(hex2dec('00000010')); % WdSaveFormat
wdDoNotSaveChanges = int32(hex2dec('00000000'));      % WdSaveOptions

%// start MS Word 
Word = actxserver('Word.Application');
%Word.Visible = true;

%// open text file in MS Word
doc = Word.Documents.Open(...
    fnameTXT, ...                % FileName
    [], ...                      % ConfirmConversions
    [], ...                      % ReadOnly
    [], ...                      % AddToRecentFiles
    [], ...                      % PasswordDocument
    [], ...                      % PasswordTemplate
    [], ...                      % Revert
    [], ...                      % WritePasswordDocument
    [], ...                      % WritePasswordTemplate
    wdOpenFormatUnicodeText, ... % Format
    msoEncodingUTF8, ...         % Encoding
    [], ...                      % Visible
    [], ...                      % OpenAndRepair
    [], ...                      % DocumentDirection
    [], ...                      % NoEncodingDialog
    []);                         % XMLTransform

%// save it as docx
doc.SaveAs2(...
    fnameDOCX, ...               % FileName
    wdFormatDocumentDefault, ... % FileFormat
    [], ...                      % LockComments
    [], ...                      % Password
    [], ...                      % AddToRecentFiles
    [], ...                      % WritePassword
    [], ...                      % ReadOnlyRecommended
    [], ...                      % EmbedTrueTypeFonts
    [], ...                      % SaveNativePictureFormat
    [], ...                      % SaveFormsData
    [], ...                      % SaveAsAOCELetter
    msoEncodingUTF8, ...         % Encoding
    [], ...                      % InsertLineBreaks
    [], ...                      % AllowSubstitutions
    [], ...                      % LineEnding
    [], ...                      % AddBiDiMarks
    []),                         % CompatibilityMode

%// close doc, quit, and cleanup
doc.Close(wdDoNotSaveChanges, [], [])
Word.Quit()
clear doc Word