如何从网站将网址中的文本转换为TextView或Android / Java中的字符串。
<html>
<body>
<div class="text" id="editorText" itemprop="text">I want to get this Text</div>
<body>
<html>
“我希望将此文本”转换为字符串:
String TextFromWebsiteHere
修改
在下面的Jsoup上尝试答案后出现异常:
01-30 05:39:17.460: I/dalvikvm(1013): Could not find method org.jsoup.Jsoup.connect, referenced from method com.example.putstring.MainActivity.onClick2
01-30 05:39:17.460: W/dalvikvm(1013): VFY: unable to resolve static method 5293: Lorg/jsoup/Jsoup;.connect (Ljava/lang/String;)Lorg/jsoup/Connection;
01-30 05:39:17.460: D/dalvikvm(1013): VFY: replacing opcode 0x71 at 0x0002
01-30 05:39:18.450: D/dalvikvm(1013): GC_FOR_ALLOC freed 122K, 6% free 3260K/3452K, paused 33ms, total 36ms
01-30 05:39:18.800: D/gralloc_goldfish(1013): Emulator without GPU emulation detected.
01-30 05:39:35.150: D/AndroidRuntime(1013): Shutting down VM
01-30 05:39:35.150: W/dalvikvm(1013): threadid=1: thread exiting with uncaught exception (group=0xb1a97b90)
01-30 05:39:35.160: E/AndroidRuntime(1013): FATAL EXCEPTION: main
01-30 05:39:35.160: E/AndroidRuntime(1013): Process: com.example.putstring, PID: 1013
01-30 05:39:35.160: E/AndroidRuntime(1013): java.lang.NoClassDefFoundError: org.jsoup.Jsoup
01-30 05:39:35.160: E/AndroidRuntime(1013): at com.example.putstring.MainActivity.onClick2(MainActivity.java:133)
01-30 05:39:35.160: E/AndroidRuntime(1013): at com.example.putstring.MainActivity.access$8(MainActivity.java:131)
01-30 05:39:35.160: E/AndroidRuntime(1013): at com.example.putstring.MainActivity$1.onClick(MainActivity.java:52)
01-30 05:39:35.160: E/AndroidRuntime(1013): at android.view.View.performClick(View.java:4424)
01-30 05:39:35.160: E/AndroidRuntime(1013): at android.view.View$PerformClick.run(View.java:18383)
01-30 05:39:35.160: E/AndroidRuntime(1013): at android.os.Handler.handleCallback(Handler.java:733)
01-30 05:39:35.160: E/AndroidRuntime(1013): at android.os.Handler.dispatchMessage(Handler.java:95)
01-30 05:39:35.160: E/AndroidRuntime(1013): at android.os.Looper.loop(Looper.java:137)
01-30 05:39:35.160: E/AndroidRuntime(1013): at android.app.ActivityThread.main(ActivityThread.java:4998)
01-30 05:39:35.160: E/AndroidRuntime(1013): at java.lang.reflect.Method.invokeNative(Native Method)
01-30 05:39:35.160: E/AndroidRuntime(1013): at java.lang.reflect.Method.invoke(Method.java:515)
01-30 05:39:35.160: E/AndroidRuntime(1013): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:777)
01-30 05:39:35.160: E/AndroidRuntime(1013): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:593)
01-30 05:39:35.160: E/AndroidRuntime(1013): at dalvik.system.NativeStart.main(Native Method)
01-30 05:39:42.580: I/Process(1013): Sending signal. PID: 1013 SIG: 9
答案 0 :(得分:1)
使用此:
String simpleString = Html.fromHtml("your_html_string").toString();
它将根除所有html内容,并以简单的字符串返回内容。
此外,如果你想区分不同的标签/类,并且需要html中的特定文本,那么你可能需要使用一些复杂的解决方案,如JSOUP。
答案 1 :(得分:1)
您可以将jsoup
库用于您的目的;一个简单的例子,用于阅读以下网站上呈现的HTML
的段落:
try {
Document doc = Jsoup.connect("http://popofibo.com/pop/swaying-views-of-our-past/").get();
Elements paragraphs = doc.select("p");
for(Element p : paragraphs) {
System.out.println(p.text());
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
输出:
确实难以争论主流的进化观念 人类文明......
修改强>
要将HTML作为静态内容演示,您可以轻松使用id
标记的div
:
public static void main(String... args) {
Document doc = Jsoup
.parse("<html><body><div class=\"text\" id=\"editorText\" itemprop=\"text\">I want to get this Text</div></body></html>");
Elements divs = doc.select("div#editorText");
for (Element d : divs) {
System.out.println(d.text());
}
}
<强>输出:强>
我想要这个文字
答案 2 :(得分:1)
这个解决方案怎么样(不使用外部库):
public static String getContentFromHtmlPage(String page) {
StringBuilder sb = new StringBuilder();
try {
URLConnection connection = new URL(page).openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
while ((line = in.readLine()) != null) {
sb.append(line);
}
in.close();
} catch (IOException e) {
// handle exception
}
return Html.fromHtml(sb.toString()).toString();
}