如何使用HtmlUnit搜索YouTube

时间:2011-07-21 00:35:20

标签: java youtube htmlunit

我想知道是否可以使用HtmlUnit搜索YouTube。我开始编写代码,这里是:

import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;

public class HtmlUnitExampleTestBase {
    private static final String YOUTUBE = "http://www.youtube.com";
    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        WebClient webClient = new WebClient();
        webClient.setThrowExceptionOnScriptError(false);

        //This is equivalent to typing youtube.com to the adress bar of browser
        HtmlPage currentPage = webClient.getPage("http://www.youtube.com");

        //Get form where submit button is located
        HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");
        //Printing result form
        System.out.println(searchForm.asText());
        final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) newPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']");
        for (int i=0; i<listLinks.size(); i++){
            System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href"));
        }
    }   
}

现在我不知道如何在搜索字段中键入一些文本,然后按“搜索”按钮。

我看过有关HtmlUnit的教程,但我遇到了问题,因为他们使用名为getElementByName的方法,但YouTube上的搜索按钮没有名称,只有id。有人能帮助我吗?

编辑:我编辑了代码上面的代码,现在我从第一页获得了youtube链接。但在此之前,我需要按上传日期排序,然后抓取链接。有人可以帮我做排序吗?

2 个答案:

答案 0 :(得分:3)

我不是HtmlUnit专家,但有一种解决方法。您可以将自己的按钮添加到表单并使用它来提交表单。

以下是带注释的代码示例:

import java.io.IOException;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

public class HtmlUnitExampleTestBase {
   public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
      WebClient webClient = new WebClient();
      webClient.setThrowExceptionOnScriptError(false);

      // This is equivalent to typing youtube.com to the adress bar of browser
      HtmlPage currentPage = webClient.getPage("http://www.youtube.com");

      // Get form where submit button is located
      HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");

      // Get the input field.
      HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
      // Insert the search term.
      searchInput.setText("Nyan Cat");

      // Workaround: create a 'fake' button and add it to the form.
      HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
      submitButton.setAttribute("type", "submit");
      searchForm.appendChild(submitButton);

      // Workaround: use the reference to the button to submit the form. 
      HtmlPage newPage = submitButton.click();

      System.out.println(newPage.asText());
   }
}

答案 1 :(得分:1)

HtmlUnit没问题,但我非常希望WatirSelenium用于网络自动化。

HtmlUnit的一个缺点是缺乏以类似jQuery的方式获取DOM元素的选择器方法。查看css选择器项目,该项目将添加到HtmlUnit以帮助您轻松完成所需的操作。在Gooder Code有一个介绍。

一旦你开始工作,YouTube搜索表单的选择器将是“.search-term”,提交按钮的选择器将是“.search-button”