如何解决JSOUP中的目标标记删除问题

时间:2013-12-30 09:07:46

标签: java jquery jsoup

我想问一个问题:如何删除所选标签

网站是www.yellowbook.com

我的代码是

for (int i = 1; i < 21; i++) {
    String shopNameTemp = "";
    String shopAddressTempA = "";
    String shopAddressTempB = "";
    String shopAddressTempC = "";
    String shopAddressTempD = "";
    String shopTelTemp = "";
    String divName = "divInAreaSummary_" + String.valueOf(i);

    Elements node = doc.select("li[id=" + divName);

    shopNameTemp = node.first().select("a[class=fn]").toString();
    shopAddressTempA = node.first().select("span[class=street-address]").toString();
    shopAddressTempB = node.first().select("span[class=locality]").toString();
    shopAddressTempC = node.first().select("span[class=region]").toString();
    shopAddressTempD = node.first().select("span[class=postal-code]").toString();
    shopTelTemp = node.first().select("div[class=call phone-number]").toString();
    System.out.println("Name  " + shopNameTemp);
    System.out.println("Address" + shopAddressTempA + shopAddressTempB + shopAddressTempC + shopAddressTempD);
    System.out.println("Tel   " + shopTelTemp);

}

我的输出是:

Please input your category and location and Province...

auto repair,Seattle,WA


Name <#a class="fn" data-classid="690" href="/profile/76-station-mlk_1861635669.html" onclick="OmAdViewLeadClick('adsource: companyname', false, '8330', ';7;;;;evar33=inArea|evar34=16', 'auto repairing');" title="View more information about 76 Station MLK">76 Station MLK<#/a>

Address   <#span itemprop="streetAddress" class="street-address">15 Avenue Nw<#/span><#span itemprop="addressLocality" class="locality">Seattle<#/span><#span itemprop="addressRegion" class="region">WA<#/span><#span itemprop="postalCode" class="postal-code">98102-9810<#/span>
Tel   <#div class="call phone-number">
(206) 826-3263
<#/div>

我怎么才能得到

  

名称76 Station MLK

     

地址15 Avenue Nw Seattle WA 98102-9810

     

电话(206)826-3263

PS。我使用删除,内容将被删除,但标签仍然存在

1 个答案:

答案 0 :(得分:1)

不使用toString(),而是使用Element的text()方法仅提取文本而不提取标记。

例如:

shopNameTemp = node.first().select("a[class=fn]").text();
shopAddressTempA = node.first().select("span[class=street-address]").text();
shopAddressTempB = node.first().select("span[class=locality]").text();
shopAddressTempC = node.first().select("span[class=region]").text();
shopAddressTempD = node.first().select("span[class=postal-code]").text();
shopTelTemp = node.first().select("div[class=call phone-number]").text();

当您将其打印到控制台时,这应该会产生正确的文本。请注意,您可能需要在+ " " +shopAddressTempA等之间手动添加一些空格(例如shopAddressTempB),否则所有这些空格都将打印,不会有空格。

我测试了这个,我的输出是:

Name  76 Station MLK
Address 2801 Martin Luther King Jr Way S Seattle WA 98144-6003
Tel   (206) 722-4995