我尝试将文件读取为字符串,我尝试将编码设置为UTF-8但仍然失败,它会在输出中返回一些奇怪的字符。
这是我阅读文件的功能:
private static String readFile(String path, boolean isRaw) throws UnsupportedEncodingException, FileNotFoundException{
File fileDir = new File(path);
try{
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream(fileDir), "UTF-8"));
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
in.close();
return str;
}
catch (UnsupportedEncodingException e)
{
System.out.println(e.getMessage());
}
catch (IOException e)
{
System.out.println(e.getMessage());
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
return null;
}
第一行的输出是: 1
提前致谢。
答案 0 :(得分:3)
此文件以UTF16-LE编码,并具有Byte order mark,有助于确定编码。使用"UTF-16LE"
字符集(或StandardCharsets.UTF_16LE
)并跳过文件的第一个字符(例如,在第一行调用str.substring(1)
)。
答案 1 :(得分:1)
您的文件看起来像是一个BOM文件。如果您不需要处理BOM字符,请打开notepad ++并将文件编码为UTF-8而不使用BOM
要在java中处理BOM文件,请查看此apache site for BOMInputStream
示例:
private static String readFile(String path, boolean isRaw) throws UnsupportedEncodingException, FileNotFoundException{
File fileDir = new File(path);
try{
BOMInputStream bomIn = new BOMInputStream(new FileInputStream(fileDir), ByteOrderMark.UTF_16LE);
//You can also detect UTF-8, UTF-16BE, UTF-32LE, UTF-32BE by using this below constructure
//BOMInputStream bomIn = new BOMInputStream(new FileInputStream(fileDir), ByteOrderMark.UTF_16LE,
// ByteOrderMark.UTF_16BE, ByteOrderMark.UTF_32LE, ByteOrderMark.UTF_32BE, ByteOrderMark.UTF_8);
if(bomIn.hasBOM()){
System.out.println("Input file was encoded as a bom file, the bom character has been removed");
}
BufferedReader in = new BufferedReader(
new InputStreamReader(
bomIn, "UTF-8"));
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
in.close();
return str;
}
catch (UnsupportedEncodingException e)
{
System.out.println(e.getMessage());
}
catch (IOException e)
{
System.out.println(e.getMessage());
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
return null;
}