Question

我正在尝试使用java和JSOUP实现webcrawler。我有以下Html内容

<div class="backgroundSection">
        <span class="lndustryProductLinks">
  <span class="backgroundDetailHeading">Industry:&nbsp;</span><a class="criterion" rel="nofollow" href="#industryList=[{value:'66826',isUsed:true}]&amp;level=0&amp;type=company">Automotive Service &amp; Collision Repair</a>
                        ,
<a class="criterion" rel="nofollow" href="#industryList=[{value:'1290',isUsed:true}]&amp;level=0&amp;type=company">Consumer Services</a>
            </span>
        </div>

<div class="backgroundSection">
<span class="backgroundDetailHeading">Products and Services:&nbsp;</span>
`<span class="lndustryProductLinks"><a class="criterion"`href="#industryKeywords='windshield auto glass replacement services'&amp;level=0&amp;type=company" rel="NOFOLLOW">windshield auto glass replacement services</a>`
<a class="criterion" href="#industryKeywords='free mobile windshield repair services'&amp;level=0&amp;type=company" rel="NOFOLLOW">free mobile windshield repair services</a>
,


<a class="criterion" href="#industryKeywords='local automotive glass services'&amp;level=0&amp;type=company" rel="NOFOLLOW">local automotive glass services</a></span></div>

当我尝试使用以下代码获取行业链接（消费者服务）时：

public String getIndustry(String url)
    {
        String text=null;
        String nilav=null;
        try
        {
            doc=Jsoup.connect(url).get();
            Elements e=doc.getElementsByClass("backgroundDetailHeading");
            text=e.text();

        }

        catch(IOException e)
        {
            e.printStackTrace();
        }

        return text;            
    }

我也获得了div类产品和服务的链接。但我只想要工业div类的标准类值。我怎么能这样做？

使用JSOUP从具有重复类名的HTML中提取信息？

0 个答案: