应用程序不抓取HTML

时间:2011-06-12 17:34:42

标签: android html android-emulator web-scraping

编辑:

错误:

06-12 19:25:55.880: ERROR/AndroidRuntime(226): Uncaught handler: thread main exiting due to uncaught exception
06-12 19:25:55.910: ERROR/AndroidRuntime(226): java.lang.NullPointerException
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at com.laytproducts.songmaster.mainAct$1.onClick(mainAct.java:124)
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at android.view.View.performClick(View.java:2364)
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at android.view.View.onTouchEvent(View.java:4179)
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at android.widget.TextView.onTouchEvent(TextView.java:6541)
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at android.view.View.dispatchTouchEvent(View.java:3709)
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:884)
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:884)
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:884)
06-12 19:25:55.910: ERROR/AndroidRuntime(226):     at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:884)
....

代码:

rawHtml = getHtml(baseSite + rSearched); //get the raw html of page
if(rawHtml == null || rawHtml.length() < 1){//checks if it really contains anything
    Toast.makeText(getApplicationContext(), "Error: Got No Result", Toast.LENGTH_SHORT).show();
    Log.e("RawHtml Error:1", "Nothing In rawHtml String");
} else {
    for(int i = 1; i < 7; i++){
        String html = parseHtml(rawHtml,i);
        if(html == null || html.length() < 1){
            results[i-1] = "Result not found:Please try different lyrics";
        } else {
            results[i-1] = parseHtml(rawHtml,i); //error here
        }
    }
}

parseHtml:

public String parseHtml(String html,int num){
    String parsed = "";
    String artistParse = "";
    String songParse = "";
    //String fullHtmlParse = "#NUMBER#. &nbsp;<span>This Charming Man</span> &nbsp; by Smiths</a>";//Reference
    if(num != 0 && num <= 6){
        songParse = StringUtils.substringBetween(html,num+". &nbsp;<span>","</span>");
        artistParse = StringUtils.substringBetween(html,num+". &nbsp;<span>"+songParse+"</span> &nbsp; by ","</a>");
    } else {
        Toast.makeText(getApplicationContext(),
                "Error: Number is wrong in parseHtml, Please Try Again.", Toast.LENGTH_SHORT).show();
        Log.e("ParsedHtml Error:1","Error: Number in parseHtml is invalid: " + num);
        return "";
    }
    parsed = songParse + ":" + artistParse;
    return parsed;
}

getHtml:

public String getHtml(String url){
    String html = "";
    String baseHtml = "";
    String table = "";
    try {
        baseHtml = new StringReader(url).toString();
    } catch (Exception e) {
        Toast.makeText(getApplicationContext(), "Error getting HtmlDoc, Please Try Again.", Toast.LENGTH_SHORT).show();
        e.printStackTrace();
        return "";
    }
    if(baseHtml == null || baseHtml.length() < 1){
        Toast.makeText(getApplicationContext(), "Error getting HtmlDoc, Please Try Again.", Toast.LENGTH_SHORT).show();
        Log.e("BaseHtml Error:1","Error: Nothing in baseHtml[method getHtml(String url)]");
        return "";
    } else {
        //table = StringUtils.substringBetween(baseHtml,"<!-- EyesLyrics.com search results -->","</table>");
    }
    html = baseHtml;
    return html;
}

希望这就是你所需要的。

1 个答案:

答案 0 :(得分:1)

要显示Toast,您需要调用show() - 方法。像这样:

Toast.makeText(getApplicationContext(), "Error getting HtmlDoc, Please Try Again.", Toast.LENGTH_SHORT).show();

有关详细信息,请参阅here


if(rawHtml == "" || rawHtml == null){[...]}

我猜rawHtml - 对象是一个字符串?在这种情况下,如果你想检查这个String是否为空,你既不会使用equals("") - 方法,也不会检查字符串的长度:

if (rawHtml.length < 1)

此外,如果您需要检查字符串是否为null,则应首先执行此操作,因为检查长度(例如)会导致NullPointerException


关于您的代码

if(rawHtml == null ...

测试rawHtml是否为空是不必要的,因为在getHtml - 方法中,您将其作为空字符串创建。永远不会是null

return "";

您在parseHtmlgetHtml - 方法中返回一个空字符串。我宁愿返回null,然后检查返回的值是否为null。你可以放弃一个具有相同效果的条件。

results[i-1] = "...";

计算应该用括号括起来,如下所示:

results[(i-1)] = "...";

你的错误aperas就在这一行:

results[i-1] = parseHtml(rawHtml,i);

由于我无法看到您是否初始化results - 对象,我想这是您的问题。在您可以访问(写入或读取)此数组中的元素之前,需要初始化该数组。

这就是你如何做到这一点,猜测你的数组是一个字符串数组:

String[] results = new String[NumberOfElements];

我希望这能解决你的问题。