我不能在我的文件中写一些字符,任何字符串都有'é'''等等字符,但所提到的字符会被改成一些特殊的字符'@'...,任何人都可以告诉我我的代码有什么问题吗? 我也想最初用UTF-16小端创建我的文件而不用只有小端的bom,是这样做的吗?
PrintWriter writer = new PrintWriter(file, "UTF-16LE");
@SuppressWarnings("resource")
BufferedReader lire = new BufferedReader(new FileReader(file1));
do {
String line = lire.readLine();
if (line == null) {
break;
}
//<a href="Substance/acide_folinique-3875.htm">acide folinique</a>
Pattern p = Pattern.compile("<a href=\"Substance/.+>(.+)</a>");
Matcher m = p.matcher(line);
if (m.find()) {
byte ptext[] = m.group(1)
.getBytes("UTF-16LE");
String line2 = new String(ptext, "UTF-16LE");
String line3 = line2.toLowerCase();
writer.write(line3 + ",.N+subst");
writer.write(System.getProperty("line.separator"));
} else {
p = Pattern.compile("<a href=\"Medicament/.+>(.+)\\s.+</a>");
m = p.matcher(line);
if (m.find()) {
byte ptext[] = m.group(1)
.getBytes("UTF-16LE");
String line2 = new String(ptext, "UTF-16LE");
String line3 = line2.toLowerCase();
writer.write(line3 + ",.N+medic");
writer.write(System.getProperty("line.separator"));
}
}
} while (true);
writer.close();
答案 0 :(得分:1)
输入文件的编码是什么?如果它不是您的程序运行的机器的默认编码,则数据将在读取时损坏。在下面的例子中,我假设输入是UTF-16LE;如果错误,您需要更改Charset
传递给newBufferedReader()
来电。
try (BufferedReader lines = Files.newBufferedReader(src, StandardCharsets.UTF_16LE);
Writer writer = Files.newBufferedWriter(dst, StandardCharsets.UTF_16LE)) {
Pattern substance = Pattern.compile("<a href=\"Substance/.+>(.+)</a>");
Pattern medic = Pattern.compile("<a href=\"Medicament/.+>(.+)\\s.+</a>");
String sep = System.getProperty("line.separator");
while (true) {
String line = lines.readLine();
if (line == null)
break;
Matcher m = substance.matcher(line);
if (m.find()) {
String ptext = m.group(1).toLowerCase();
writer.write(ptext);
writer.write(",.N+subst");
writer.write(sep);
} else {
m = medic.matcher(line);
if (m.find()) {
String ptext = m.group(1).toLowerCase();
writer.write(ptext + ",.N+medic");
writer.write(sep);
}
}
}
}