Question

我们正在使用外部服务以CSV格式获取数据。我们正在尝试将数据写入响应，以便csv可以下载到客户端。不幸的是，我们正在以下面的格式获取数据。

Amount inc. VAT      Balance
Â£112.83             Â£0.0
Â£97.55              Â£0.0
Â£15.28              Â£0.0

我们无法解码内容。有没有办法解码Â£并在java中显示£。

是否有可用于解码字符串的String Util。

Answer 1

该文件似乎以UTF-8编码。你应该把它读作UTF-8。

如果您使用java.io.FileReader和公司，则应打开FileInputStream并使用InputStreamReader代替：

// Before: Reader in = new FileReader(file)
Reader in = new InputStreamReader(new FileInputStream(file), "UTF-8");

如果您正在使用其他方法来读取文件（可能是外部或内部类库？），请检查其文档是否允许指定用于读取文件的文本编码。

更新：如果你已经有一个像Â£97.55那样的mojibake字符串并且无法修复它的读取方式，一种重新编码的方法是将字符串转换回字节并重新输入将字节解释为UTF-8。此过程不需要任何外部“StringUtils”或编解码器库; Java标准API足够强大：

String input = ...obtain from somewhere...;
String output = new String(input.getBytes(/*use platform default*/), "UTF-8");

Answer 2

<强>问题：当我们在字符串上使用getBytes（）时，它会尝试使用默认编码器进行解码。一旦字符串被编码，如果我们使用默认解码器，解码可能无法正常工作。

解决方案：一个 Apache的StringUtils将帮助我们在写回响应时解码这些字符。该课程在org.apache.commons.codec.binary包中提供。

String CSVContent = "/* CSV data */";
/**
 *  Decode the bytes using UTF8.  
 */
String decodedStr = StringUtils.newStringUtf8(CSVContent.getBytes("UTF-8"));
/**
 *  Convert the decoded string to Byte array to write to the stream  
 */
Byte [] content = StringUtils.getBytesIso8859_1(decodedStr);

Maven 2.0依赖。

<dependency>
     <groupId>commons-codec</groupId>
     <artifactId>commons-codec</artifactId>
     <version>1.6</version>
</dependency>

解决方案：两个

根据@Joni，使用标准API的更好解决方案：

content = CSVContent.getBytes("ISO-8859-1");

Answer 3

我们很幸运能拥有Java 7。您可以使用Paths，Files和StandardCharsets执行以下操作：

Path path = Paths.get("/tmp", "input.txt");
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);
for (String line : lines) {
    System.out.println(line);
}

在java中解码编码的Pound符号

3 个答案: