如何在C#中替换扩展的ASCII字符?

时间:2016-01-29 05:31:05

标签: c# .net vb.net

我正在尝试替换不可打印的字符,即从巨大的字符串中扩展的ASCII字符。

foreach (string line in File.ReadLines(txtfileName.Text))
            {
                MessageBox.Show( Regex.Replace(line,
              @"\p{Cc}",
              a => string.Format("[{0:X2}]", " ")
            )); ;

            }

这似乎没有用。

EX: AAAA应转换为AA AA

2 个答案:

答案 0 :(得分:1)

假设编码为UTF8,请尝试:

string strReplacedVal = Encoding.ASCII.GetString(
        Encoding.Convert(
            Encoding.UTF8,
            Encoding.GetEncoding(
                Encoding.ASCII.EncodingName,
                new EncoderReplacementFallback(" "),
                new DecoderExceptionFallback()
                ),
            Encoding.UTF8.GetBytes(line)
        )
);

答案 1 :(得分:0)

由于您要将文件打开为UTF-8,因此必须是。因此,它的代码单位是一个字节,UTF-8具有非常好的特征,即编码高于␡的字符,字节高于0x7f,字符等于或低于␡,字节专用于或低于0x7f。

为了提高效率,您可以一次将文件重写几KB。

注意:但有些字符可能会被多个空格所取代。

// Operates on a UTF-8 encoded text file
using (var stream = File.Open(path, FileMode.Open, FileAccess.ReadWrite))
{
    const int size = 4096;
    var buffer = new byte[size];
    int count; 
    while ((count = stream.Read(buffer, 0, size)) > 0)
    {
        var changed = false;
        for (int i = 0; i < count; i++)
        {
            // obliterate all bytes that are not encoded characters between ␠ and ␡ 
            if (buffer[i] < ' ' | buffer[i] > '\x7f')
            {
                buffer[i] = (byte)' ';
                changed = true;
            }
        }
        if (changed)
        {
            stream.Seek(-count, SeekOrigin.Current);
            stream.Write(buffer, 0, count);
        }
    }
}