从网页获取文本

时间:2012-11-15 07:42:54

标签: android

我想从网上获取文本,我编写代码,使用indexOf和subString,但它不起作用。

int index = response_str.indexOf("Remote IP Address:");
index += "Remote IP Address:".length();
index += "</div><br /><div id=\"value1\">".length();
int end = response_str.indexOf("</div><br /><br />", index);
String strIP = response_str.substring(index, end);      
Log.d("","Hello" + strIP  );

这是示例文本,我希望 49.156.53.152

<body>
<div id="title">Remote IP Address:</div><br /><div id="value1">**49.156.53.152**</div><br /><br />
<div id="title">UserAgent:</div><br /><div id="value2">Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11</div><br /><br />
<!-- Everyone of CCorp employees, Good luck ! --><br />
</body>

4 个答案:

答案 0 :(得分:0)

您必须使用Java的Javascript Interface对象来从Html网页获取所有数据。

final WebView webview = (WebView) findViewById(R.id.browser);
    webview.getSettings().setJavaScriptEnabled(true);
    webview.addJavascriptInterface(new MyJavaScriptInterface(this), "HtmlViewer");

    webview.setWebViewClient(new WebViewClient() {
        @Override
        public void onPageFinished(WebView view, String url) {
            webview.loadUrl("javascript:window.HtmlViewer.showHTML" +
                    "('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
        }
    });

    webview.loadUrl("http://android-in-action.com/index.php?post/" +
            "Common-errors-and-bugs-and-how-to-solve-avoid-them");
}

class MyJavaScriptInterface {

    private Context ctx;

    MyJavaScriptInterface(Context ctx) {
        this.ctx = ctx;
    }

    public void showHTML(String html) {
        new AlertDialog.Builder(ctx).setTitle("HTML").setMessage(html)
                .setPositiveButton(android.R.string.ok, null).setCancelable(false).create().show();
    }

}

答案 1 :(得分:0)

您可以将html页面转换为刺痛。并使用正则表达式/字符串操作来获取所需的数据

try {
                if(!url_text.getText().toString().trim().equalsIgnoreCase("")){
                    textView.setText("");
                    HttpClient client = new DefaultHttpClient();
                    HttpGet request = new HttpGet(url_text.getText().toString());
                    // Get the response
                    ResponseHandler<String> responseHandler = new BasicResponseHandler();
                    String response_str = client.execute(request, responseHandler);
                    textView.setText(response_str);
                }else{
                    Toast.makeText(getApplicationContext(), "URL String empty.", Toast.LENGTH_LONG).show();
                }
            }
            catch (Exception e) {
                System.out.println("Some error occured.");
                textView.setText(e.getMessage());
            }

也许最简单的方法是使用sting.split函数

String[] separated = response_str.split("**");
separated[0]; // part before the **
separated[1]; // your needed ip string
separated[2]; // part after the second **

答案 2 :(得分:0)

你应该尝试使用它。

    myString = String.subString("<div id=\"value1\">", YOUR_ORIGINAL_STRING.len);
    String required = myString.endsWith("</div>");

答案 3 :(得分:0)

您可以使用JSoup

Document doc = Jsoup.connect(URL_TO_HTML_PAGE).get();
Elements newsHeadlines = doc.getElementsByAttribute("value1");
String ip = newsHeadlines[0].text().split("**")[1];

最后一行是基于Greezer的帖子。我个人会用一个匹配所有可能的IP地址的简单正则表达式替换它。

类似的东西:

\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b