编码不会影响阅读word文档C#

时间:2013-01-02 21:36:49

标签: c# office-interop

尝试从word文档中读取unicode字符,但获取符号(????)。

这是我的代码:

   Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
            object miss = System.Reflection.Missing.Value;
             object enc = Microsoft.Office.Core.MsoEncoding.msoEncodingEUCJapanese; 
            object path = @"C:\Users\file.doc"
            object readOnly = true;
            Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss,
                ref miss, ref miss, ref miss, ref miss, ref miss, ref enc, ref miss, ref miss, ref miss, ref miss, ref miss);
            string totaltext = "";
            for (int i = 0; i < docs.Paragraphs.Count; i++)
            {
                totaltext += " \r\n " + docs.Paragraphs[i + 1].Range.Text.ToString();

                Console.WriteLine(totaltext);
            }
           // Console.WriteLine(totaltext);
            docs.Close();
            word.Quit();

1 个答案:

答案 0 :(得分:2)

鉴于这些评论,听起来问题可能只与Console.WriteLine有关。

尝试写入文件:

// This will use Encoding.UTF8 by default.
using (var writer = File.CreateText("test.txt"))
{
    for (int i = 0; i < docs.Paragraphs.Count; i++)
    {
        writer.WriteLine(docs.Paragraphs[i + 1].Range.Text.ToString());
    }
}

然后在记事本中打开文件,指定UTF-8作为编码,我怀疑你会看到所有内容。