我使用MATLAB以编程方式在Windows上创建Microsoft Word文档。通常,此解决方案工作正常,但它与非ASCII文本有问题。例如,请使用以下代码:
wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
selection = wordApplication.Selection;
umbrella = char(9730);
disp(umbrella)
selection.TypeText(umbrella)
命令窗口正确显示伞形字符,但Word文档中的字符是框中的"问号"缺少字符符号。我可以将命令窗口中的字符剪切并粘贴到Word中,这样该字符确实可以使用该字体。
TypeText方法必须假设为ASCII。有关如何为其他语言的类似操作设置Unicode标志的资源,但我不知道如何将它们转换为我在MATLAB中可用的语法。
澄清:我的用例是发送一个未知的Unicode字符串(char数组),而不仅仅是一个字符。能够一次发送所有内容是理想的。这是更好的示例代码:
% Define a string to send with a non-ASCII character.
umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
disp(toSend)
% Open a new Word document.
wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
% Send the text.
selection = wordApplication.Selection;
selection.TypeText(toSend)
我希望我可以简单地设置文档本身的encoding,但这似乎没有帮助:
wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
disp(wordApplication.ActiveDocument.TextEncoding)
wordApplication.ActiveDocument.TextEncoding = 65001;
disp(wordApplication.ActiveDocument.TextEncoding)
selection = wordApplication.Selection;
toSend = sprintf('Have you seen my \23002?');
selection.TypeText(toSend)
答案 0 :(得分:9)
取自here:
umbrella = 9730; %// Unicode number of the desired character
selection.InsertSymbol(umbrella, '', true); %// true means use Unicode
第二个参数指定字体(因此您可以使用'Arial'
等),而''
显然意味着使用当前字体。第三个参数'true'
表示使用Unicode。
一种不那么直接的方式,取自here:
umbrella = 9730; %// Unicode number of the desired character
selection.TypeText(dec2hex(umbrella));
selection.ToggleCharacterCode;
如果您不介意使用剪贴板,则可以立即使用字符串:
umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
clipboard('copy', toSend); %// copy the Unicode string contained in variable `toSend`
selection.Paste %// paste it onto the Word document
答案 1 :(得分:4)
我也尝试了这个,并且遇到了您报告的相同问题(我使用MATLAB R2015a和Office 2013进行了测试)......
我认为MATLAB和Word之间的COM层中的某些东西搞乱了文本编码。
为了确认这确实是MATLAB中的一个错误,我在Python中尝试了同样的方法,它运行良好:
#!/usr/bin/env python
import os
import win32com.client
word = win32com.client.Dispatch("Word.Application")
word.Visible = True
doc = word.Documents.Add()
str = u"Have you seen my " + unichr(9730) + u"?"
word.Selection.TypeText(str)
fname = os.path.join(os.getcwd(), "out.docx")
doc.SaveAs2(fname)
doc.Close()
word.Quit()
我想出了两个MATLAB的解决方法:
这个想法是创建一个使用Office Interop的.NET程序集。它将接收任何Unicode字符串并将其写入某些指定的Word文档。 然后可以将此程序集加载到MATLAB中,并用作MS Office的包装程序。
C#中的示例:
<强> MSWord.cs 强>
using System;
using Microsoft.Office.Interop.Word;
namespace MyOfficeInterop
{
public class MSWord
{
// this is very basic, but you can expose anything you want!
public void AppendTextToDocument(string filename, string str)
{
Application app = null;
Document doc = null;
try
{
app = new Application();
doc = app.Documents.Open(filename);
app.Selection.TypeText(str);
app.Selection.TypeParagraph();
doc.Save();
}
catch (Exception)
{
throw;
}
finally
{
doc.Close();
app.Quit();
}
}
}
}
我们首先编译它:
csc.exe /nologo /target:library /out:MyOfficeInterop.dll /reference:"C:\Program Files (x86)\Microsoft Visual Studio 12.0\Visual Studio Tools for Office\PIA\Office15\Microsoft.Office.Interop.Word.dll" MSWord.cs
然后我们从MATLAB测试它:
%// load assembly
NET.addAssembly('C:\path\to\MyOfficeInterop.dll')
%// I am assuming the document file already exists
fname = fullfile(pwd,'test.docx');
fclose(fopen(fname,'w'));
%// some text
str = ['Have you seen my ' char(9730) '?'];
%// add text to Word document
word = MyOfficeInterop.MSWord();
word.AppendTextToDocument(fname, str);
这更像是一个黑客!我们只需将MATLAB中的文本直接写入文本文件(正确编码)。然后我们使用COM / ActiveX接口在MS Word中打开它,并将其重新保存为正确的.docx Word文档。
示例:
%// params
fnameTXT = fullfile(pwd,'test.txt');
fnameDOCX = fullfile(pwd,'test.docx');
str = ['Have you seen my ' char(9730) '?'];
%// create UTF-8 encoded text file
bytes = unicode2native(str, 'UTF-8');
fid = fopen(fnameTXT, 'wb');
fwrite(fid, bytes);
fclose(fid);
%// some office interop constants (extracted using IL DASM)
msoEncodingUTF8 = int32(hex2dec('0000FDE9')); % MsoEncoding
wdOpenFormatUnicodeText = int32(hex2dec('00000005')); % WdOpenFormat
wdFormatDocumentDefault = int32(hex2dec('00000010')); % WdSaveFormat
wdDoNotSaveChanges = int32(hex2dec('00000000')); % WdSaveOptions
%// start MS Word
Word = actxserver('Word.Application');
%Word.Visible = true;
%// open text file in MS Word
doc = Word.Documents.Open(...
fnameTXT, ... % FileName
[], ... % ConfirmConversions
[], ... % ReadOnly
[], ... % AddToRecentFiles
[], ... % PasswordDocument
[], ... % PasswordTemplate
[], ... % Revert
[], ... % WritePasswordDocument
[], ... % WritePasswordTemplate
wdOpenFormatUnicodeText, ... % Format
msoEncodingUTF8, ... % Encoding
[], ... % Visible
[], ... % OpenAndRepair
[], ... % DocumentDirection
[], ... % NoEncodingDialog
[]); % XMLTransform
%// save it as docx
doc.SaveAs2(...
fnameDOCX, ... % FileName
wdFormatDocumentDefault, ... % FileFormat
[], ... % LockComments
[], ... % Password
[], ... % AddToRecentFiles
[], ... % WritePassword
[], ... % ReadOnlyRecommended
[], ... % EmbedTrueTypeFonts
[], ... % SaveNativePictureFormat
[], ... % SaveFormsData
[], ... % SaveAsAOCELetter
msoEncodingUTF8, ... % Encoding
[], ... % InsertLineBreaks
[], ... % AllowSubstitutions
[], ... % LineEnding
[], ... % AddBiDiMarks
[]), % CompatibilityMode
%// close doc, quit, and cleanup
doc.Close(wdDoNotSaveChanges, [], [])
Word.Quit()
clear doc Word