Question

我有一个Rtf文件，需要读取文件才能解析。文件中有一些特殊字符，因为文件中有图像。当我从文件中读取所有文本时，无法读取特殊字符之后的内容。

我尝试使用ReadAllText和Encoding.UTF8用Encoding.ASCII读取文件

public class ReadFile
{
    public static string GetFileContent(string path)
    {
        if (!File.Exists(path))
        {
            throw new FileNotFoundException();
        }
        else
        {
            // I also tried 
            // return File.ReadAllText(path, Encoding.ASCII);
            string text = string.Empty;
            var fileStream = new FileStream(path, FileMode.Open, FileAccess.Read);
            using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
            {
                string line;
                while ((line = streamReader.ReadLine()) != null)
                {
                    text += line;
                }
            }
            return text;
        }
    }
}

实际上，我的结果是所有文本，直到以特殊字符开头。

{\ rtf1 \ ansi \ ansicpg1252 \ deff0 \ deftab720 {\ fonttbl {\ f0 \ fnil Times New Roman;} {\ f1 \ fnil Arial;}} {\ colortbl; \ red000 \ green000 \ blue000; \ red255 \ green000 \ blue000; \ red128 \ green128 \ blue128;} \ paperw11905 \ paperh16837 \ margl360 \ margr360 \ margt360 \ margb360 \ sectd \ sectdefaultcl \ marglsxn360 \ margrsxn360 \ margtsxn360 \ margbsxn360 {{* \ do \ dobxpage \ dobypage \ dodhgt8192 \ dptxbx {\ dptxbxtext \ pard \ plain {\ pict \ wmetafile8 \ picw19499 \ pichgo \ bin342908

Rtf File is here

Answer 1

我做了。要读取文件，我使用了[ 1000000, 10100000, 10000000, 10000000, 10000100, 10000010 ]，并在结果变量中将字节0替换为（nul），将字节27替换为esc。

File.ReadAllBytes(path)

I found the help in

我无法读取所有Rtf文件内容

1 个答案: