我正在尝试替换不可打印的字符,即从巨大的字符串中扩展的ASCII字符。
foreach (string line in File.ReadLines(txtfileName.Text))
{
MessageBox.Show( Regex.Replace(line,
@"\p{Cc}",
a => string.Format("[{0:X2}]", " ")
)); ;
}
这似乎没有用。
EX: AAAA应转换为AA AA
答案 0 :(得分:1)
假设编码为UTF8,请尝试:
string strReplacedVal = Encoding.ASCII.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.GetEncoding(
Encoding.ASCII.EncodingName,
new EncoderReplacementFallback(" "),
new DecoderExceptionFallback()
),
Encoding.UTF8.GetBytes(line)
)
);
答案 1 :(得分:0)
由于您要将文件打开为UTF-8,因此必须是。因此,它的代码单位是一个字节,UTF-8具有非常好的特征,即编码高于␡的字符,字节高于0x7f,字符等于或低于␡,字节专用于或低于0x7f。
为了提高效率,您可以一次将文件重写几KB。
注意:但有些字符可能会被多个空格所取代。
// Operates on a UTF-8 encoded text file
using (var stream = File.Open(path, FileMode.Open, FileAccess.ReadWrite))
{
const int size = 4096;
var buffer = new byte[size];
int count;
while ((count = stream.Read(buffer, 0, size)) > 0)
{
var changed = false;
for (int i = 0; i < count; i++)
{
// obliterate all bytes that are not encoded characters between ␠ and ␡
if (buffer[i] < ' ' | buffer[i] > '\x7f')
{
buffer[i] = (byte)' ';
changed = true;
}
}
if (changed)
{
stream.Seek(-count, SeekOrigin.Current);
stream.Write(buffer, 0, count);
}
}
}