我不能在我的java文件中写一些字符(编码问题)

时间:2014-12-20 20:43:28

标签: java encoding char

我不能在我的文件中写一些字符,任何字符串都有'é'''等等字符,但所提到的字符会被改成一些特殊的字符'@'...,任何人都可以告诉我我的代码有什么问题吗? 我也想最初用UTF-16小端创建我的文件而不用只有小端的bom,是这样做的吗?

PrintWriter writer = new PrintWriter(file, "UTF-16LE");
@SuppressWarnings("resource")
BufferedReader lire = new BufferedReader(new FileReader(file1));
do {
    String line = lire.readLine();
    if (line == null) {
        break;
    }
    //<a href="Substance/acide_folinique-3875.htm">acide folinique</a>

    Pattern p = Pattern.compile("<a href=\"Substance/.+>(.+)</a>");
    Matcher m = p.matcher(line);
    if (m.find()) {
        byte ptext[] = m.group(1)
                        .getBytes("UTF-16LE");
        String line2 = new String(ptext, "UTF-16LE");
        String line3 = line2.toLowerCase();


        writer.write(line3 + ",.N+subst");
        writer.write(System.getProperty("line.separator"));
    } else {
        p = Pattern.compile("<a href=\"Medicament/.+>(.+)\\s.+</a>");
        m = p.matcher(line);
        if (m.find()) {
            byte ptext[] = m.group(1)
                            .getBytes("UTF-16LE");
            String line2 = new String(ptext, "UTF-16LE");
            String line3 = line2.toLowerCase();


            writer.write(line3 + ",.N+medic");
            writer.write(System.getProperty("line.separator"));
        }

    }


} while (true);

writer.close();

1 个答案:

答案 0 :(得分:1)

输入文件的编码是什么?如果它不是您的程序运行的机器的默认编码,则数据将在读取时损坏。在下面的例子中,我假设输入是UTF-16LE;如果错误,您需要更改Charset传递给newBufferedReader()来电。

try (BufferedReader lines = Files.newBufferedReader(src, StandardCharsets.UTF_16LE);
     Writer writer = Files.newBufferedWriter(dst, StandardCharsets.UTF_16LE)) {
  Pattern substance = Pattern.compile("<a href=\"Substance/.+>(.+)</a>");
  Pattern medic = Pattern.compile("<a href=\"Medicament/.+>(.+)\\s.+</a>");
  String sep = System.getProperty("line.separator");
  while (true) {
    String line = lines.readLine();
    if (line == null)
      break;
    Matcher m = substance.matcher(line);
    if (m.find()) {
      String ptext = m.group(1).toLowerCase();
      writer.write(ptext);
      writer.write(",.N+subst");
      writer.write(sep);
    } else {
      m = medic.matcher(line);
      if (m.find()) {
        String ptext = m.group(1).toLowerCase();
        writer.write(ptext + ",.N+medic");
        writer.write(sep);
      }
    }
  }
}