Question

我正在尝试删除硬空间（来自HTML中的 个实体）。我无法使用.trim()或.replace(" ", "")等删除它！我不明白。

我甚至在Stackoverflow上发现尝试使用\\u00a0，但两者都没有用。

我试过这个（因为text()会返回实际的硬空间字符，U+00A0）：

System.out.println( "'"+fields.get(6).text().replace("\\u00a0", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().replace(" ", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().trim()+"'"); //'94,00 '
System.out.println( "'"+fields.get(6).html().replace("&nbsp;", "")+"'"); //'94,00' works

但我无法弄清楚为什么我无法用.text()删除空白区域。

Answer 1

你的第一次尝试非常它，Jsoup将 映射到U + 00A0是完全正确的。您只是不希望字符串中出现双反斜杠：

System.out.println( "'"+fields.get(6).text().replace("\u00a0", "")+"'" ); //'94,00'
// Just one ------------------------------------------^

replace不使用正则表达式，因此您不会尝试将文字反斜杠传递到正则表达式级别。您只想在字符串中指定字符U + 00A0。

Answer 2

该问题已经过编辑，以反映真正的问题。

新答案; 硬空间，即。 Java中的实体（Unicode字符NO-BREAK SPACE U + 00A0）可以用字符\u00a0,表示，因此代码变为，str是从text()方法得到的字符串

str.replaceAll ("\u00a0", "");

老答案; 使用JSoup库

import org.jsoup.parser.Parser;

String str1 = Parser.unescapeEntities("last week,&nbsp;Ovokerie Ogbeta", false);
String str2 = Parser.unescapeEntities("Entered&nbsp;&raquo; Here", false);
System.out.println(str1 + " " + str2);

打印出来：

last week, Ovokerie Ogbeta Entered » Here

如何用Jsoup删除硬空间？

2 个答案: