我有一个问题,我无法解决..有人可以帮助我吗?
好的,所以我想要一个程序来规范我的文本,它会删除多个空格,它会打印原始文件中的其他字符,还会放置空格以及开始和结束符号。
所以转换,在我写完txt文件并打开它之后,我看到了这个内容:
numaituaã§ãoceoemergãªnciamã©dica
你可以看到有一些我不想要的怪异角色,也许是因为编码? 这是我的语言文本,葡萄牙语。
这是我的代码,我该如何解决?
public static void main(String[] args) throws IOException {
Charset encoding = Charset.defaultCharset();
InputStream in = new FileInputStream(new File("data.txt"));
Reader reader = new InputStreamReader(in, encoding);
Reader buffer = new BufferedReader(reader);
StringBuilder normalizedLanguage = new StringBuilder("<");
int r;
while ((r = buffer.read()) != -1) {
char ch = (char) r;
boolean newline = false;
boolean hasLetterBefore = false;
boolean hasLetterAfter = false;
char symbol = '-';
int lines = 0;
if (newline)
{
normalizedLanguage.append("\n<");
}
if (ch == '\r' || ch == '\n' )
{
lines++;
normalizedLanguage.append(">");
newline = true;
hasLetterBefore = false;
}
else if (Character.isLetterOrDigit(ch))
{
if (hasLetterBefore == true)
{
normalizedLanguage.append(Character.toString(symbol) + Character.toString(Character.toLowerCase(ch)));
}else{
normalizedLanguage.append(Character.toString(Character.toLowerCase(ch)));
}
newline = false;
hasLetterBefore = true;
}
else if (ch == ' ')
{
normalizedLanguage.append(Character.toString(ch));
newline = false;
hasLetterBefore = false;
}
else if (ch == '\t')
{
System.out.println("Tab detected: " + ch);
newline = false;
hasLetterBefore = false;
}
else
{
//Símbolos, entre outros..
if (!hasLetterBefore)
{
normalizedLanguage.append(" " + Character.toString(ch) + " ");
}
else
{
symbol = ch;
}
newline = false;
}
}
String normalizedLanguageString = normalizedLanguage.toString().trim().replaceAll(" +", " ");
PrintWriter out = new PrintWriter("data_after.txt");
out.println(normalizedLanguageString);
out.close();
buffer.close();
reader.close();
in.close();
}
非常感谢您提前;)
答案 0 :(得分:0)
使用另一个Charset编码解决了问题:)
更改此行:
Charset encoding = Charset.defaultCharset();
要:
Charset encoding = Charset.forName("UTF8");
非常感谢你