如何使用jSoup从Yelp中检索信息?

时间:2016-11-22 20:54:56

标签: java html css jsoup elements

编程很新,而且我一直在教自己Java。我目前正在尝试做的是在特定的yelp搜索时提取所有给定公司的名称,并将结果存储到数组中。这是我的目标:

import java.util.ArrayList;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;

public class YelpScraper
{
    public static void main(String[] args) throws IOException
    {
        String url = "https://www.yelp.com/search?find_desc=&find_loc=new+jersey&ns=1";
        Document document = Jsoup.connect(url).get();

        Elements elements = document.getElementsByClass("biz-name js-analytics-click");

        for (Element element : elements)
        {
            System.out.println(elements.toString());
        }
    }
}

现在我的问题在这里。这是输出:



<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/casa-d-paco-newark" data-hovercard-id="iIJ-dWgYcZTewVGJyP6EfQ"><span>Casa d’Paco</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hero-king-handcrafted-sandwiches-newark" data-hovercard-id="hzwE2ub1J7fTwJDjTJwksA"><span>Hero King Handcrafted Sandwiches</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/the-green-chicpea-newark-2" data-hovercard-id="bDWWtSm-8uoW9_urjMCzTA"><span>The Green Chicpea</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/casa-d-paco-newark" data-hovercard-id="iIJ-dWgYcZTewVGJyP6EfQ"><span>Casa d’Paco</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hero-king-handcrafted-sandwiches-newark" data-hovercard-id="hzwE2ub1J7fTwJDjTJwksA"><span>Hero King Handcrafted Sandwiches</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/the-green-chicpea-newark-2" data-hovercard-id="bDWWtSm-8uoW9_urjMCzTA"><span>The Green Chicpea</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/casa-d-paco-newark" data-hovercard-id="iIJ-dWgYcZTewVGJyP6EfQ"><span>Casa d’Paco</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hero-king-handcrafted-sandwiches-newark" data-hovercard-id="hzwE2ub1J7fTwJDjTJwksA"><span>Hero King Handcrafted Sandwiches</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/the-green-chicpea-newark-2" data-hovercard-id="bDWWtSm-8uoW9_urjMCzTA"><span>The Green Chicpea</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/casa-d-paco-newark" data-hovercard-id="iIJ-dWgYcZTewVGJyP6EfQ"><span>Casa d’Paco</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hero-king-handcrafted-sandwiches-newark" data-hovercard-id="hzwE2ub1J7fTwJDjTJwksA"><span>Hero King Handcrafted Sandwiches</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/the-green-chicpea-newark-2" data-hovercard-id="bDWWtSm-8uoW9_urjMCzTA"><span>The Green Chicpea</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a>
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a>
&#13;
&#13;
&#13;

如您所见,它输出该类的HTML代码,我想要的只是业务的名称。关于如何以不同方式做到这一点的任何想法。很明显,getElementsByClass()方法不是我应该使用的方法。谢谢高级人员!

1 个答案:

答案 0 :(得分:0)

您可以首先遍历元素的子元素,也可以使用更精细的粒度选择。我更改了您的选择以返回包含标题的跨度,并使用text()方法返回span标记内的文本。

Elements elements = document.select(".indexed-biz-name span");
for (Element element : elements) 
{
    System.out.println(element.text());
}